JDBC plugin Feeder Mode

2015-01-18 Thread 4m7u1
Hi,

This is what I've understood so far, JDBC plugin in Feeder mode is run as a 
bash script with parameters similar to river. The documentation says that 
it is a push model. Can anyone explain how does it work? If i have a new 
data pushed into my db, what role does the feeder play from here on?

Thank you.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/54750384-49bd-471d-8cac-86aa9f43a9fb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Writing custom scripts for indexing data in Elasticsearch

2015-01-18 Thread 4m7u1
Thank you :)

On Friday, January 16, 2015 at 6:27:48 PM UTC+5:30, Jörg Prante wrote:
>
> "schedule" is triggering the JDBC plugin by wall clock time of the 
> machine, where "interval" simply waits the given time period between two 
> runs.
>
> Jörg
>
> On Fri, Jan 16, 2015 at 11:12 AM, Amtul Nazneen  > wrote:
>
>> Thank you. Is it the "interval" parameter or "schedule" parameter? If I 
>> set the schedule parameter, then the Elasticsearch will poll the tables 
>> accordingly right?
>>
>> On Wednesday, January 14, 2015 at 2:31:07 PM UTC+5:30, David Pilato wrote:
>>>
>>> I guess you need to set interval. See doc plugin on the home page of the 
>>> JDBC river.
>>>
>>> interval - a time value for the delay between two river runs (default: 
>>> not set)
>>>
>>> --
>>> David ;-)
>>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>>>
>>> Le 14 janv. 2015 à 06:01, Amtul Nazneen  a écrit :
>>>
>>> Ohkay. So the river runs only once when the script starts? And after 
>>> that won't it be running in the background to fetch the updates according 
>>> to a schedule?
>>>
>>> On Monday, January 12, 2015 at 1:23:08 PM UTC+5:30, Ed Kim wrote:

 It executes once. You could consider running that script on a schedule 
 and doing incremental updates using timestamps. 

 On Sunday, January 11, 2015 at 9:24:28 PM UTC-8, Amtul Nazneen wrote:
>
> Thank you. I have a doubt though, once I run the script, the river 
> plugin is started and the data gets indexed into Elasticsearch, I want to 
> know, if the plugin would be running after that, or does it stop once the 
> script execution comes to an end?
>
>
>>  -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/a0997950-4ca4-4036-9550-d1da3816b503%
>>> 40googlegroups.com 
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/9db75940-2ca3-4140-a681-cba55ac3725a%40googlegroups.com
>>  
>> 
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2c72317f-1d59-4b12-955a-5c990b2491ee%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elasticsearch JDBC river plugin- Interval vs Schedule.

2015-01-18 Thread 4m7u1
Okay got it. Thanks :). And are both the same when it comes to performance?


>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/8d4069fb-ed3a-4b41-b917-d262366a99c8%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7b3f1a29-016b-48f6-899a-08425432e0a8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elsticsearch JDBC river plugin metrics

2015-01-18 Thread 4m7u1
Thank you so much Jörg :) !

On Friday, January 16, 2015 at 6:08:49 PM UTC+5:30, Jörg Prante wrote:
>
> These are diagnostic messages which have been crept into one of the 
> releases. Latest version has metrics logging disabled, it must be enabled 
> by settings.
>
> The metrics count the number of rows fetched form the database, and prints 
> them at every minute. This is not the number of documents in ES.
>
> The metrics print an average mean of the row count, so you can see that 
> your database sent 250 rows per second. It also counts the data volume in 
> bytes, and print the measure of megabytes per second, which is in 
> interesting number for throughput.
>
> Jörg
>
> On Fri, Jan 16, 2015 at 12:16 PM, 4m7u1 
> > wrote:
>
>> I'm trying to run a river query which fetches 1 records scheduled at 
>> 1 minute interval. The first time it runs, metrics is 1 rows and after 
>> a gap of 1 minute that is (scheduled interval) metrics is 2 rows. What 
>> does this mean? Although the number of hits i get on querying the river 
>> index is 1 itself. Why do the metrics on rows keep on increasing by a 
>> factor of 1 each time? 
>>
>> *[2015-01-16 16:38:03,406][INFO ][river.jdbc.RiverMetrics  ] pipeline 
>> org.xbib.elasticsearch.plugin.jdbc.RiverPipeline@20f4b6fe complete: river 
>> jdbc/my_jdbc_river metrics: 1 rows, 250.5428448620193 mean, (0.0 0.0 
>> 0.0), ingest metrics: elapsed 3 seconds, 3.37 MB bytes, 352.0 bytes avg, 1 
>> MB/s*
>>
>> and also can anyone explain the below values?
>>
>> metrics: 1 rows, 250.5428448620193 mean, (0.0 0.0 0.0), 
>> ingest metrics: elapsed 3 seconds, 3.37 MB bytes, 352.0 bytes avg, 1 MB/s.
>>
>> Thank you.
>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/a043ffe1-976d-4c07-bca6-a7ef93f14b3b%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9338605d-5b9d-4af2-8b62-625f64c7509a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Only send requests directly to data notes and not master nodes?

2015-01-18 Thread Justin Zhu
Would all transport clients only connect to this client node? Right now we 
have them connecting to all 3 master node.

On Sunday, January 18, 2015 at 8:43:08 PM UTC-8, Mark Walkom wrote:
>
> It depends on your use, but try adding one client in with 8GB heap and see 
> how you go.
>
> On 19 January 2015 at 16:48, Justin Zhu > 
> wrote:
>
>> We give the master nodes 5gb of memory -- stats are showing low cpu & 
>> memory utilization. Would you still recommend the client only node? If so, 
>> how many & powerful?
>>
>>
>> On Saturday, January 17, 2015 at 6:55:12 PM UTC-8, Mark Walkom wrote:
>>>
>>> Depends, sounds like you need a few client nodes if you are OOMing your 
>>> masters (which, is a bad thing to happen to masters).
>>>
>>> On 18 January 2015 at 10:23, Justin Zhu  wrote:
>>>
 We have a 9 node cluster, 3 masters, 6 data. We've been using the java 
 transport client, which connects to all 3 masters. Occassionaly the 
 masters 
 become unresponsive and needs to restart. 

 Should the transport clients connect to the 6 data nodes directly 
 instead?

 Thanks.

 -- 
 You received this message because you are subscribed to the Google 
 Groups "elasticsearch" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/752c15cc-bf88-4219-8699-279466534696%
 40googlegroups.com 
 
 .
 For more options, visit https://groups.google.com/d/optout.

>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/6d55f417-6584-46b4-83e5-daf1267e771e%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/08805992-9794-4465-b900-d067bcbed3a3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Different results with/without preference=_primary_first/_replica_first using count API

2015-01-18 Thread Xiaoting Ye
Hi,

I'm bulk indexing massive data however when I check the status, I found 
some interesting results:

When I called: curl -XGET 
'http://localhost:9200/my_index/my_type/_count?pretty' -d '{"query" : { 
"filtered": {"filter" : {"exists" : {"field": "visibility"}}'

It returned: 
{
  "count" : 27395968,
  "_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
  }
}

When I called: curl -XGET 
'http://localhost:9200/my_index/my_type/_count?pretty&preference=
*_primary_first*' -d '{"query" : { "filtered": {"filter" : {"exists" : 
{"field": "visibility"}}'

It returned as below.
{
  "count" : 36802421,
  "_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
  }
}

As we can see, there is a hugh difference between the the above two 
results. So I thought it was caused by the inconsistency between the 
primary shard and the replicas.

Then I called curl -XGET 
'http://localhost:9200/my_index/my_type/_count?pretty&preference=
*_replica_first*' -d '{"query" : { "filtered": {"filter" : {"exists" : 
{"field": "visibility"}}'

To my surprise, the count is roughly the same(slight difference caused by 
bulk indexing that I'm still running).

{
  "count" : 36867417,
  "_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
  }
}

So any idea why this happen?

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/df5ac31e-ad17-4b8c-b979-18c4a977bb0c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Only send requests directly to data notes and not master nodes?

2015-01-18 Thread Mark Walkom
It depends on your use, but try adding one client in with 8GB heap and see
how you go.

On 19 January 2015 at 16:48, Justin Zhu  wrote:

> We give the master nodes 5gb of memory -- stats are showing low cpu &
> memory utilization. Would you still recommend the client only node? If so,
> how many & powerful?
>
>
> On Saturday, January 17, 2015 at 6:55:12 PM UTC-8, Mark Walkom wrote:
>>
>> Depends, sounds like you need a few client nodes if you are OOMing your
>> masters (which, is a bad thing to happen to masters).
>>
>> On 18 January 2015 at 10:23, Justin Zhu  wrote:
>>
>>> We have a 9 node cluster, 3 masters, 6 data. We've been using the java
>>> transport client, which connects to all 3 masters. Occassionaly the masters
>>> become unresponsive and needs to restart.
>>>
>>> Should the transport clients connect to the 6 data nodes directly
>>> instead?
>>>
>>> Thanks.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/752c15cc-bf88-4219-8699-279466534696%
>>> 40googlegroups.com
>>> 
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/6d55f417-6584-46b4-83e5-daf1267e771e%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9sRKQLksSM2bxyOaY2CG%3D7JeUt6ZDOorFLhn1YwhJ7BQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Only send requests directly to data notes and not master nodes?

2015-01-18 Thread Justin Zhu
We give the master nodes 5gb of memory -- stats are showing low cpu & 
memory utilization. Would you still recommend the client only node? If so, 
how many & powerful?


On Saturday, January 17, 2015 at 6:55:12 PM UTC-8, Mark Walkom wrote:
>
> Depends, sounds like you need a few client nodes if you are OOMing your 
> masters (which, is a bad thing to happen to masters).
>
> On 18 January 2015 at 10:23, Justin Zhu > 
> wrote:
>
>> We have a 9 node cluster, 3 masters, 6 data. We've been using the java 
>> transport client, which connects to all 3 masters. Occassionaly the masters 
>> become unresponsive and needs to restart. 
>>
>> Should the transport clients connect to the 6 data nodes directly instead?
>>
>> Thanks.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/752c15cc-bf88-4219-8699-279466534696%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6d55f417-6584-46b4-83e5-daf1267e771e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Correct way to use TransportClient connection object

2015-01-18 Thread Subhadip Bagui
Hi,

In the same context... some times when I'm shutting down tomcat getting the 
below exception. And other times it works. Any idea why ?

Jan 19, 2015 8:59:30 AM org.apache.catalina.core.StandardContext 
listenerStop
SEVERE: Exception sending context destroyed event to listener instance of 
class com.aricent.aricloud.es.service.ESClientFactory
java.lang.NoClassDefFoundError: 
org/elasticsearch/transport/netty/NettyTransport$4
at 
org.elasticsearch.transport.netty.NettyTransport.doStop(NettyTransport.java:403)
at 
org.elasticsearch.common.component.AbstractLifecycleComponent.stop(AbstractLifecycleComponent.java:105)
at 
org.elasticsearch.transport.TransportService.doStop(TransportService.java:100)
at 
org.elasticsearch.common.component.AbstractLifecycleComponent.stop(AbstractLifecycleComponent.java:105)
at 
org.elasticsearch.common.component.AbstractLifecycleComponent.close(AbstractLifecycleComponent.java:117)
at 
org.elasticsearch.client.transport.TransportClient.close(TransportClient.java:268)
at 
com.aricent.aricloud.es.service.ESClientFactory.shutdown(ESClientFactory.java:118)
at 
com.aricent.aricloud.es.service.ESClientFactory.contextDestroyed(ESClientFactory.java:111)

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4b948428-c260-4aef-ad82-93346e7488cb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How highlighting actually works?

2015-01-18 Thread Nikolas Everett
Highlighting is complex and more hacky than you'd imagine at first glance.
Each highlighter is different and we can't tell which one you are using
without seeing your mapping. For the plain highlighter the cost is roughly
proportional to the length of the highlighted field. So in your case its
the cost to reanalyze every one of those pages.

You could return which page is matched pretty cheaply if you were willing
to write a plugin. Especially if you just wanted to know the first page or
something.

You could try using explain if you searched for text_content_*.  That'd
tell you which field matched.

Nik
On Jan 18, 2015 6:21 PM, "Karol Sikora"  wrote:

> Hi all,
>
> I have some specific requirements for highlighting. I need to search in
> full content of item for phrase, and then show on which page searched
> phrase is occuring. So i've created one field named text_content and fields
> named text_content_{page_number} (text_content_1, text_content_2, etc.).
> Example query is:
> {
> "highlight": {
> "fields": {
> "text_content_*": {}
> }
> },
> "query": {
> "match": {
> "text_content": "lorem"
> }
> },
> "size": 40
> }
>
> I've noticed that this query is fast, but only if i have small number of
> documents in index. Quiering for documents is always fast (<40ms), but
> highlight phase time is growing when number of documents in index is
> growing.
> I've stared thinking that highlighting may be processed before appending
> "size": 40 - on the all matched documents. It's correct? How can in speed
> up such case?
>
> Regards,
> Karol
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/b8354eb3-3a75-4999-a180-6493240eb0cc%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd221YctsJE3QrkqnffjXACNzcZ5WaiuR1Ucrr0DV_U_NA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


How highlighting actually works?

2015-01-18 Thread Karol Sikora
Hi all,

I have some specific requirements for highlighting. I need to search in 
full content of item for phrase, and then show on which page searched 
phrase is occuring. So i've created one field named text_content and fields 
named text_content_{page_number} (text_content_1, text_content_2, etc.).
Example query is:
{
"highlight": {
"fields": {
"text_content_*": {}
}
}, 
"query": {
"match": {
"text_content": "lorem"
}
}, 
"size": 40
}

I've noticed that this query is fast, but only if i have small number of 
documents in index. Quiering for documents is always fast (<40ms), but 
highlight phase time is growing when number of documents in index is 
growing.
I've stared thinking that highlighting may be processed before appending 
"size": 40 - on the all matched documents. It's correct? How can in speed 
up such case?

Regards,
Karol

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b8354eb3-3a75-4999-a180-6493240eb0cc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Unicode characters and spaces in elasticsearch field names

2015-01-18 Thread joergpra...@gmail.com
You can find this in the source code.

E.g.

org.elasticsearch.index.mapper.ContentPath -> see delimiter variable, it is
'.' by default

org.elasticsearch.index.mapper.Uid -> see DELIMITER, it is set to '#'

and for '*'

org.elasticsearch.index.mapper.FieldMappersLookup and
 org.elasticsearch.index.mapper.object.DynamicTemplate -> are using
org.elasticsearch.common.reges.Regex.simpleMatch on field names, a
simplified regex routine which supports *abc, a*bc, and abc* patterns

Jörg


On Sun, Jan 18, 2015 at 10:36 PM, George  wrote:

> Does anybody have an idea at least where in the elasticsearch code this is
> handled?
>
> Thanks!
>
> On Friday, January 16, 2015 at 9:21:34 AM UTC+1, George wrote:
>>
>>
>> Hello everybody,
>>
>>
>> I've researched a little bit what characters are allowed in elasticsearch
>> field names.
>> However, I couldn't find any official documentation only some posts which
>> mentioned that '.', '#' and '*' are discouraged. See
>> http://elasticsearch-users.115913.n3.nabble.com/Illegal-characters-in-
>> elasticsearch-field-names-td4054773.html.
>>
>> I've  indexed some fields which contained spaces and unicode
>> characters with elasticsearch 1.4.2 ("lucene_version": "4.10.2"). I was
>> able to retrieve the documents with
>> term query without any problems.
>>
>> My question would be, are there any pitfalls when using unicode
>> characters and spaces in elasticsearch field names? or is this discouraged?
>>
>>
>> Many thanks,
>> George
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/d4a24d5a-c932-45c9-81d8-59323dd767bf%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEQJJSBEjAg0a-hSny%3DbmM-oD5Rp1OBHmNXUs-_-SoSaA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Unicode characters and spaces in elasticsearch field names

2015-01-18 Thread George
Does anybody have an idea at least where in the elasticsearch code this is 
handled? 

Thanks!

On Friday, January 16, 2015 at 9:21:34 AM UTC+1, George wrote:
>
>
> Hello everybody, 
>
>
> I've researched a little bit what characters are allowed in elasticsearch 
> field names. 
> However, I couldn't find any official documentation only some posts which 
> mentioned that '.', '#' and '*' are discouraged. See 
> http://elasticsearch-users.115913.n3.nabble.com/Illegal-characters-in-elasticsearch-field-names-td4054773.html
> .
>
> I've  indexed some fields which contained spaces and unicode 
> characters with elasticsearch 1.4.2 ("lucene_version": "4.10.2"). I was 
> able to retrieve the documents with 
> term query without any problems. 
>
> My question would be, are there any pitfalls when using unicode characters 
> and spaces in elasticsearch field names? or is this discouraged?
>
>
> Many thanks, 
> George
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d4a24d5a-c932-45c9-81d8-59323dd767bf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to Limit Search With-In Selected Document ID or Document ID List

2015-01-18 Thread Adrien Grand
Hi,

This use-case typically looks like a join (search within the results of
another search request) so you should look at whether you can change the
way that you model your data in order to be able to use nested docs or the
parent/child functionality. Otherwise, there is no better way,
elasticsearch does not support general-purpose joins.

On Thu, Jan 15, 2015 at 5:50 PM, ATL  wrote:

> Use Case: If we have 10 Million documents index on single server in
> Elastic Search in single index, If user want to search but limit search
> result in given 1 Million Doc ID filter so ES only return result within
> given 1 million Doc ID filter
>
> When apply search - divide 1 million Doc in 10K doc ID list batches and
> apply with search and loop through all batches On each result return - keep
> merge the result and send one merge result to end user in website
>
>
> Is there any other better way to do search which get faster result and no
> need do batch search etc if we want to limit search result with in selected
> 1 million Doc ID
>
> Thanks
> -V
>
>
> *Notice of Confidentiality*
>
> *This email message and its attachments (if any) are intended solely for
> the use of the addressees hereof. In addition, this message and any
> attachments may contain information that is confidential, privileged and
> exempt from disclosure under applicable law.  If you are not the intended
> recipient of this message, you are prohibited from reading, disclosing,
> reproducing, distributing, disseminating or otherwise using this
> transmission. Delivery of this message to any person other than the
> intended recipient is not intended to waive any right or privilege.  If you
> have received this message in error, please promptly notify the sender by
> reply email and immediately delete this message from your system.*
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/7a0bf27c-f36f-48f8-942c-571158633ac1%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7rLKF94s7C%2BNFhgqdQp2yW1A-qLOWJ0zqKFPBhQK8%2BQQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: A question about keyword_marker

2015-01-18 Thread Adrien Grand
Tokens are stored in a hash table, which provides random access in constant
time so I would not worry too much about performance. However, these tokens
will be stored in memory so you should keep the size of the list reasonable.

On Sun, Jan 18, 2015 at 4:58 PM, Nassim  wrote:

> Hi all,
>
> I would like to know if there is a limitation of the number of words that
> we can give to the keyword_marker instruction ? And if there is a big
> impact on the performance of ES ?
>
> Thank you !
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/039be4ae-c273-4ec8-b131-b4eeb7d38f5c%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j4UXUsX84L%3DhtBjf9uoPPJ9ZZCGfrO-3%2BfBpaPz%3DLRG8w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Improving the default routing hash function

2015-01-18 Thread Adrien Grand
Hi Andrew,

This is indeed an issue. For your information, elasticsearch will switch to
murmur3 in the next major version. For backward compatibility, old indices
will still use DJB, but newly created indices will use murmur3. There is
more background about this issue at
https://github.com/elasticsearch/elasticsearch/pull/7954

I don't know why DJB was chosen, but I believe that the fact that it
performs well on incremental ids (0, 1, 2, 3, ...) and the default number
of shards played a role into this choice (wild guess).


On Sun, Jan 18, 2015 at 9:22 PM, Andrew White  wrote:

> I noticed that the default routing hash function is DJB. This function is
> particularly poor at routing when the input keys are short and are mildly
> different. For example, basic two digit hex based values "00" -> "FF"
> produce very large hot spots on clusters of size 11, 16, and 17 and others.
> By "large" I mean that on an uniform input distribution, the largest shards
> is over 2x larger (sometimes up to 4x!) than the smallest shard.
>
> I feel it is reasonable to assume from a usability standpoint that if the
> routing key is an order of magnitude larger than the modulus that the
> resulting document distribution in the shards to be uniform. In our case,
> we had 255 distinct routing keys over 17 shards and the smallest shard is
> 40% the size of the largest. Furthermore, we know that the number of
> documents per routing key is roughly the same.
>
> This almost feels like a bug (maybe it is). It is certainly unexpected.
> Something like FNV seems like a good alternative. I would add the the Java
> string hash alternative isn't much better in a lot of cases, especially
> short inputs.
>
> Is there a particular reason for using DJB? Any chance of changing the
> default or including something like FNV out-of-the box? I would also
> suggest a note in the documentation about the potential for hotspot simply
> due to routing key selection. Any thoughts in general?
>
> Thanks,
> Andrew White
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/cff4dd98-411e-47f8-9679-6d44f5f97806%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7i5oYWVNg1Qnyt2U9-gFYyoKGabSb49tWgCOcLTW5O2g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Improving the default routing hash function

2015-01-18 Thread Andrew White
I noticed that the default routing hash function is DJB. This function is 
particularly poor at routing when the input keys are short and are mildly 
different. For example, basic two digit hex based values "00" -> "FF" 
produce very large hot spots on clusters of size 11, 16, and 17 and others. 
By "large" I mean that on an uniform input distribution, the largest shards 
is over 2x larger (sometimes up to 4x!) than the smallest shard.

I feel it is reasonable to assume from a usability standpoint that if the 
routing key is an order of magnitude larger than the modulus that the 
resulting document distribution in the shards to be uniform. In our case, 
we had 255 distinct routing keys over 17 shards and the smallest shard is 
40% the size of the largest. Furthermore, we know that the number of 
documents per routing key is roughly the same.

This almost feels like a bug (maybe it is). It is certainly unexpected. 
Something like FNV seems like a good alternative. I would add the the Java 
string hash alternative isn't much better in a lot of cases, especially 
short inputs.

Is there a particular reason for using DJB? Any chance of changing the 
default or including something like FNV out-of-the box? I would also 
suggest a note in the documentation about the potential for hotspot simply 
due to routing key selection. Any thoughts in general? 

Thanks,
Andrew White


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cff4dd98-411e-47f8-9679-6d44f5f97806%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


A question about keyword_marker

2015-01-18 Thread Nassim
Hi all,

I would like to know if there is a limitation of the number of words that 
we can give to the keyword_marker instruction ? And if there is a big 
impact on the performance of ES ?

Thank you ! 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/039be4ae-c273-4ec8-b131-b4eeb7d38f5c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: fielddata doesn't agree with _source for a "long" field

2015-01-18 Thread Sergey Tsalkov
Oh, long is actually an integer field, isn't it? I feel like such an idiot. 
Much gratitude to you!

On Sunday, January 18, 2015 at 1:35:22 AM UTC-8, David Pilato wrote:
>
> You should change the mapping for this field and use float or double: 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#number
>
> David
>
> Le 18 janv. 2015 à 10:25, Sergey Tsalkov > 
> a écrit :
>
> Hey guys! I'm a newcomer and have been diving deep into ElasticSearch for 
> the last week.
>
> Today, I've been trying to debug a maddening issue: I have a "long" field 
> that contains decimals between 0 and 1, and sorting on it is not working. 
> The records that are exactly 0 or exactly 1 show up in the right place, but 
> any values in between are treated as if they were 0 for the purpose of 
> sorting. I found the fielddata_fields setting, and it turns out my hunch 
> was right: a given value would show up as 0.155676 in _source, but 
> wrongfully show as 0 in the "fields" hash that I get from fielddata_fields.
>
> I can't figure out why, and google comes up with nothing. Help would be 
> appreciated!
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/8af2c674-c8ac-4942-bc64-7f1310fa7546%40googlegroups.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/94fadaee-047a-43f6-b1b9-e51f08a92ded%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: fielddata doesn't agree with _source for a "long" field

2015-01-18 Thread David Pilato
You should change the mapping for this field and use float or double: 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html#number

David

> Le 18 janv. 2015 à 10:25, Sergey Tsalkov  a écrit :
> 
> Hey guys! I'm a newcomer and have been diving deep into ElasticSearch for the 
> last week.
> 
> Today, I've been trying to debug a maddening issue: I have a "long" field 
> that contains decimals between 0 and 1, and sorting on it is not working. The 
> records that are exactly 0 or exactly 1 show up in the right place, but any 
> values in between are treated as if they were 0 for the purpose of sorting. I 
> found the fielddata_fields setting, and it turns out my hunch was right: a 
> given value would show up as 0.155676 in _source, but wrongfully show as 0 in 
> the "fields" hash that I get from fielddata_fields.
> 
> I can't figure out why, and google comes up with nothing. Help would be 
> appreciated!
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/8af2c674-c8ac-4942-bc64-7f1310fa7546%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/29EA5BA9-709D-41A0-857F-2C3CA375AF0C%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


fielddata doesn't agree with _source for a "long" field

2015-01-18 Thread Sergey Tsalkov
Hey guys! I'm a newcomer and have been diving deep into ElasticSearch for 
the last week.

Today, I've been trying to debug a maddening issue: I have a "long" field 
that contains decimals between 0 and 1, and sorting on it is not working. 
The records that are exactly 0 or exactly 1 show up in the right place, but 
any values in between are treated as if they were 0 for the purpose of 
sorting. I found the fielddata_fields setting, and it turns out my hunch 
was right: a given value would show up as 0.155676 in _source, but 
wrongfully show as 0 in the "fields" hash that I get from fielddata_fields.

I can't figure out why, and google comes up with nothing. Help would be 
appreciated!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8af2c674-c8ac-4942-bc64-7f1310fa7546%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.