Re: cluster setup

2014-01-20 Thread Alexander Reelsen
Hey,

I am not a huge fan of the OOM killer to be honest. However, something is
going against your plans when the OOM killer kicks in

You configured 30GB heap, but you are running out of memory (then most of
the time the process which takes the most memory is killed, obviously
elasticsearch). But why are you running out of memory? Do you have any
other service running on that machine, which eats up system memory? Please
check (or disable the OOM killer, but you should find out why it kicks in).

Also, use the nodes info API to find out if bootstrap.mlockall setting is
really configured correctly on your nodes.


--Alex


On Thu, Jan 16, 2014 at 5:50 PM, Tula tulay.muezzino...@gmail.com wrote:

 Hi,

 I have 3 ubuntu VM's on a private network, each has 64GB ram. I started
 ES Beta2 (need it to use term vector feature) on each node with 30GB heap
 space and with the following changes in the configuration file:

 discovery.zen.ping.multicast.enabled: false
 discovery.zen.ping.unicast.hosts: [host1,host2,host3]
 bootstrap.mlockall: true
 gateway.type: local
 gateway.recover_after_nodes: 2
 gateway.expected_nodes: 3
 discovery.zen.minimum_master_nodes: 2

 Sometimes es processes get killed by OOM killer and when I restart a node
 I ran into situation like
  host1,host2,host3  thinking all three nodes are connected and form a
 cluster, and the host3 thinks it is all by itself with status red (names
 are random)
  host1 and host2 form one cluster and host2 and host3 forms another
 cluster.

 Any idea about what I am doing wrong and any suggestions? I will need to
 index 20 million documents and have 3 separate indexes with 5 shards 1
 replica.

 Thanks,
 T


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/62130b83-40aa-468e-9449-0046497d39ff%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM_1C-pbsffn4pCkG5dNXKgJrvKEKF-SsBxoCR3cfRuA7A%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: total.store.size_in_bytes measures what?

2014-01-20 Thread Alexander Reelsen
Hey,

the source is just a field in the index, thats the reason for being
included. What is not included is the something like the translog, so it is
not the entire disk space used by an index is in there iirc.


--Alex


On Thu, Jan 16, 2014 at 6:52 PM, Ryan Pedela rped...@datalanche.com wrote:

 I did more digging. Turns out that using version 0.90.9, the _source data
 is included in the calculation. In other words, the stats are the entire
 disk space used by an index including source data. And it is broken down by
 indices, primaries, etc as Alex said.

 I did not test to see if it takes into account source data compression,
 but it appears that it does.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/5210cf67-4cf7-4695-89c3-0b3fb48a6290%40googlegroups.com
 .

 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM-xucydRcjJcZmb8zRcCQX2OzJHB%2BcDck8w%3Dq8FhHpHQA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Query score based on aggregated values

2014-01-20 Thread Alexander Reelsen
Hey,

you could execute a query, sorted by price and in the same request execute
a statistical facet for the price field and then check in your client for
each hit being returned, if it is above the average value returned by the
statistical facet.

You could also do this in two roundtrips, getting the statistical average
from the facet first and then executing a second query filtering only for
products with a price higher than the average.


--Alex


On Fri, Jan 17, 2014 at 12:43 AM, Kevin Pearson
kevin.pearso...@gmail.comwrote:

 I am wondering if there is a way to use aggregated values inside a query.

 Example:
 Say our data contains items and their price:
 {
  id : string
  name : string
  price : float
 }

 I want to do a query that returns the top items that have a price far from
 the average price of items with the same name.

 Example Data:
 *ID  |  Name  |Price*
 1|  Chair   |   5.99
 2|  Chair   |   5.99
 3|  Chair   |   59.99
 4|  Desk   |   61.00
 5|  Desk   |   60.00
 6|  Desk   |   59.99

 The top response would be ID 3, since 59.99 is way higher than the average
 price for a chair.

 I believe I need to write a custom score script, but I am not sure how I
 can get a reference to the average of items with the same name.

 Thank you,
 Kevin

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/8069763d-9ce2-4dfc-afc5-6293c2171828%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8aj-g0sbxSTtbueAh8rG7FgaGOzovVEuK7DwLbzwXjbg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Disable increment of version counter on some update operations possible?

2014-01-20 Thread joa
Thanks for your effort, Brian.
I'll think about this (I'm working with node.js not with Java anyway), but 
I already opened an issue for this 
(https://github.com/elasticsearch/elasticsearch/issues/4791).

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5b75b820-53c4-440c-8144-5427381f21e9%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Questions about multi_field, configurations, routing control, filtered alias

2014-01-20 Thread Ivan Ji
Hi, all

Recently, I am studying the ElasticSearch. I have several questions about 
it. Hope someone can answer me.

(1) About the multi_field, can it store two type of fields ? such as..

tweet : {
properties : {
name : {
type : multi_field,
fields : {
name : { type : string, index : not_analyzed },
value : { type : int}

(2) if it can, what's the query format when post a new document? Could I 
explicit specify the value of these two fields? Or there are some type cast 
operations inside it?

(3) Does there any default configuration file exist that configure the 
default schema mappings of the index and type? Does it only support REST 
API to create index/configure the mappings?

(4)After I configured the number of shards/replicas and post many documents 
into it, can I re-configure it again? And how ? if so, what happened when 
the shard number increase? Do it cost a lots of performance?

(5)About the routing, can I control the documents that must be sent to 
different shards? I know I can use the same routing value to index/search 
in the same shard. But could I control some documents which must be located 
in different shards of the other documents?

(6) Assume I have only one node and one index, what's the difference 
between the size of shard is only one and ten of the same index? Does it 
cost extra memory if the shards size is ten? What's the suggested rule to 
decide this size?

(7) What's the difference between setting the search_type to scroll and 
using the parameters(from/size)?

(8) About the alias filtering, what's the cost about creating a alias 
filter? Are there any cache algorithms to accelerate these operations using 
the alias filter? Or it just append the extra filter condition of the 
filtered alias in the query? 


Sorry for the newbie questions, could you give me some opinion about these 
questions?

Cheers,

Ivan

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a46431bd-cef8-4714-9f08-0445f376b2a1%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


jdbc-river error: no sqljdbc_auth in java.ibrary.path for sqlserver dll

2014-01-20 Thread Alfredo Serafini


Hi to all

I have the following problem with sqlserver authentication.


I am using ElasticSearch with SQLServer using the mixed authentication; I 
define my indexing on the river plugin as something like:
 curl -XPUT http://localhost:9200/_river/itra_jdbc_river/_meta; -d'
{
  type: jdbc,
  jdbc: {
driver: com.microsoft.sqlserver.jdbc.SQLServerDriver,
url: 
jdbc:sqlserver://MY_SERVER;integratedSecurity=true;databaseName=MY_DB,
user: INTRANET\\MY_USER,
password: MY_PWD,
sql: SELECT * FROM A_TABLE,
versioning: false
  },
  index: {
index: test,
type: values
  }
}'

and I started ES with -Djava.library.path=mssql\auth\x64 option, where 
mssql folder is under the jdbc-river plugin folder.

However, I still obtain the no sqljdbc_auth in java.ibrary.path error, so 
the dll seems to be not correctly referenced. I also notice that the jar 
for sqlserver must be instead on the jdbc-river folder itself.

Any suggestions?

thanks in advance,
Alfredo

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/000230a4-2f06-4fc8-b1f1-56ec69ada894%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Synonym Filter

2014-01-20 Thread paul
Hi,

My Synonym file contains the entry as below

MIT,Massachusetts Institute of Technology

My setting is as below:

   settings:{
  analysis:{
 analyzer:{
synonym:{
   tokenizer:my_pipe_analyzer,
   filter:[
  lowercase,
  syns_filter
   ]
},
my_pipe_analyzer:{
   tokenizer:my_pipe_analyzer
},
autocomplete_search:{
   type:custom,
   tokenizer:my_pipe_analyzer,
   filter:[
  lowercase,
  syns_filter,
  stop
   ]
}
 },
 tokenizer:{
my_pipe_analyzer:{
   type:pattern,
   pattern:\\|
}
 },
 filter:{
syns_filter:{
   synonyms_path:synonyms/synonym_collegename.txt,
   type:synonym,
   ignore_case:true
}
 }
  }
   }

I have created a pipe separated tokanizer so that the synonyms are not 
split on spaces still it is getting split on spaces when i verify it with 
the analyze API , below is my output from 
analyzer api.

{
   tokens:[
  {
 token:mit,
 start_offset:0,
 end_offset:3,
 type:SYNONYM,
 position:1
  },
  {
 token:massachusetts,
 start_offset:0,
 end_offset:3,
 type:SYNONYM,
 position:1
  },
  {
 token:institute,
 start_offset:0,
 end_offset:3,
 type:SYNONYM,
 position:2
  },
  {
 token:technology,
 start_offset:0,
 end_offset:3,
 type:SYNONYM,
 position:4
  }
   ]
}

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7516d1a7-72d0-4b3f-b426-deb80b8d6450%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Disable increment of version counter on some update operations possible?

2014-01-20 Thread joa
Hi Jörg, 

as I understand, even with external versioning its not possible to update a 
doc *without changing/not incrementing* the version number at all. 

1. User A loads DOC123 with version 20 and locally starts editing critical 
fields
2. User B simply loads DOC123 to view/read only. A view counter will be 
incremented, so also the version number will be set to 21 (or something 
higher with external versioning, but not to 20 again)
3. User A tries to send the updates from (1) with version number 20 to 
ensure he has the current version and update will fail, cause version 
number has changed

Thanks
Joa
 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b691324b-7955-4379-b014-ee9d27a29e38%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: jdbc-river error: no sqljdbc_auth in java.ibrary.path for sqlserver dll

2014-01-20 Thread Alfredo Serafini
Hi

a little update: adding the reference to the absolute path in the Path 
variable worked... thus seems like ES is currently ignoring the 
-Djava.library.path parameter passed from command line. Is that possible?

Il giorno lunedì 20 gennaio 2014 14:02:36 UTC+1, Alfredo Serafini ha 
scritto:


 Please, use absolute paths in java.library.path

 With other java applications a relative path works without too much 
 problems, so this was my first test. I've tested it also with absolute 
 path, without luck.  I've also tried with '\\' instead of '\', or with '/', 
 just to avoid problems with windows paths.

 Any other suggestion?

  

 Of course, JDBC jars must be in the jdbc-river folder, otherwise they can 
 not be found by ES plugin manager.

 done, in fact removing the jar, the error changes to no suitable 
 driver... etc etc

 thanks,
 Alfredo


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bf6443af-25c9-4839-85e4-3f00f3cb4dc8%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: jdbc-river error: no sqljdbc_auth in java.ibrary.path for sqlserver dll

2014-01-20 Thread joergpra...@gmail.com
You have to add parameters for the Java JVM in the JAVA_OPTS variable, e.g.
in $ES_HOME/bin/elasticsearch.in.sh

For Windows I don't know where to set JAVA_OPTS but maybe there is
something like $ES_HOME/bin/elasticsearch.bat

Jörg


On Mon, Jan 20, 2014 at 2:42 PM, Alfredo Serafini ser...@gmail.com wrote:

 Hi

 a little update: adding the reference to the absolute path in the Path
 variable worked... thus seems like ES is currently ignoring the
 -Djava.library.path parameter passed from command line. Is that possible?



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEuYUR1OUZUmX0A0-vQ0aRGTF%2BD3Tqo5eX7aMD-eHnUtA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Is there any way to remove duplicated search result in ES?

2014-01-20 Thread yang ming
Thank you for your rapid reply .

it is true that i can custom my own search action, but i can not override 
the default search action .so, it is not what i want.
at indexing time , there are serval listeners to install plugins, but at 
searching time there is hardly any listener to extend the search operation 
except the search action .
why not provide a opportunity to install my own plugin to extend the search 
phase , because it seems to be simple from the source code .

i should give up the solution using the lucene duplicate filter according 
to your answer .

it is very useful of your proposal to solve my problem .I will try it 
.thank you very much !


在 2014年1月20日星期一UTC+8下午6时45分40秒,Jörg Prante写道:

 It is not true there is no chance to install my plugin after ES have 
 collected all search result. You can implement a plugin with an 
 alternative search action. The issue you have cited is related to 
 overriding default actions and there is good reason in not allowing that.

 The Lucene DuplicateFIlter works on segment level and is not suitable for 
 index level and not for distributed search.

 The basic idea is, if you want the newest one of documents, you can sort 
 docs by timestamp, and pick the first one, ignoring the followers.

 You can use aggregations plus filtered queries to issue a series of 
 queries against an ES index and deduplicate it at client side, using your 
 custom rules of ordering (e.g. one bucket per author, and pick at most one 
 doc per author from sorted timestamped result set of a filtered query). 
 Note, this procedure is very expensive, and does not scale.

 The best method is indexing deduplicated data, which is the most preferred 
 solution, because it is cheap: fetch the list of docs per author from the 
 original source and index only the one to want to have in search results.

 Jörg


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f3cc26de-2c06-4573-b8e5-61ede607b19e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: jdbc-river error: no sqljdbc_auth in java.ibrary.path for sqlserver dll

2014-01-20 Thread joergpra...@gmail.com
Would love to help, also to Windows elasticsearch.bat specific problems,
but I'm afraid I can't.

Fact is, you have to find out where the Java Runtime of Elasticsearch is
executed - it is in the script elasticsearch or in case of Windows
elasticsearch.bat - and in that call, you must add JVM arguments like
java.library.path, and because the library loading is executed from a
daemon process, you have to choose absolute paths, so it can not fail. ES
provides JAVA_OPTS variable for convenience to collect several JVM
arguments defined in scripts.

Example

-Djava.library.path=C:\Users\Dummy\Downloads\sqljdbc_4.0\enu\auth\x64

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGouK4AnOmyfX5Gv4AARUuDwzpfk6x6g2Jb0rF7gPYUGA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Completion suggester for multiple fields

2014-01-20 Thread Angel Cross
Hello,

one small newbie question. Is it possible to use completion suggester for 
more than one field? Assuming all of them are of type completion.
Something like this 

{
song-suggest : {
text : n,
completion : {
   * fields : [suggest, name, author, smthElse]*
}
}
}


Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b888cf80-5c29-4150-97dd-8d8974ea6ee0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


1.0.0.0.RC1 breaking changes: multi_field documentation

2014-01-20 Thread InquiringMind
Looking at the following to prepare for RC1: 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/_multi_fields.html

I find the following. But shouldn't the semicolon after the first type  
string be a comma?

Brian

you can now write:

title: {
type: string;
fields: {
raw:   { type: string, index: not_analyzed }
}
}

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/008da72d-f2a3-4f76-a9ec-0dd335be8c23%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


1.0.0RC1 breaking changes : GC stats

2014-01-20 Thread Roy Russo
Well, not sure if it qualifies as a breaking change, unless you cowboy code 
javascript like I do ;-)

The node stats api (/_nodes/{this.nodeId}/stats?all=1) is returning a 
different format for JVM GC now, splitting old and young generations (which 
is really helpful)

I didn't notice this change in Beta1, so I thought I'd point it out and 
maybe someone can/might update the doc.

gc: {
   collectors: {
  young: {
 collection_count: 3,
 collection_time_in_millis: 136
  },
  old: {
 collection_count: 0,
 collection_time_in_millis: 0
  }
   }


I also didn't notice previously the jvm returning pool information (another 
helpful addition!)

   pools: {
  young: {
 used_in_bytes: 30180776,
 max_in_bytes: 279183360,
 peak_used_in_bytes: 71630848,
 peak_max_in_bytes: 279183360
  },
  survivor: {
 used_in_bytes: 8912888,
 max_in_bytes: 34865152,
 peak_used_in_bytes: 8912896,
 peak_max_in_bytes: 34865152
  },
  old: {
 used_in_bytes: 21765456,
 max_in_bytes: 724828160,
 peak_used_in_bytes: 21765456,
 peak_max_in_bytes: 724828160
  }
   }



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/da092537-d6b7-4870-81fa-ed599fe610ea%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Disable increment of version counter on some update operations possible?

2014-01-20 Thread joa
Thanks Clinton 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e11afdd7-9ee8-4038-b382-e12f6d6dbdfb%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


how can I access script field in map script of facet script plugin??

2014-01-20 Thread Kajal Patel
I need to use the script field value in my map script. I am not sure how 
can i access that. 
My query looks like this, I need to use value of state_info into my 
map_REPORT script. Is it possible to access that??
{
  query: {
   ---
  },
  script_fields: {
state_info : { script : lookup, lang : native, parems :{ 
field : state}}
  },
  facets: {
reportFacet: {
  script: {
init_script: init_REPORT,
map_script: map_REPORT,
reduce_script: reduce_REPORT,
params: {
 --
}
  }
}
  }
}

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/82332df4-a4a0-4e3b-8439-cdeb3a6bf3de%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Questions about multi_field, configurations, routing control, filtered alias

2014-01-20 Thread binh . ly
Ivan,

1) The multi_field type allows you to define different ways that a *single 
field value* will be indexed. Your example below will work and will index a 
single value as string/not_analyzed, and then as an int (use integer for 
int)

2) The document coming in will contain a field named name with a single 
value. When it goes into the index, it will be indexed 2 different ways.

3) A mapping is not required to index data. There is an implied default 
mapping that will parse your JSON content and dynamically update the schema 
if you don't specify one up-front.

4) You cannot change the shard count after the index is created. You can 
change the replica count anytime. The PUT mapping API allows you to change 
the replica count.

5) You can specify a single routing value for all documents that you want 
to go to a specific shard/location.

6) The number of shards will allow you to scale your content later. So if 
your data volume increases, you can add more nodes later and distribute the 
shards around. If you only have a single shard and you run out of space, 
then you cannot scale out unless you increase storage, or increase the 
shard count.

7) Scroll is used to do a snapshot type of search - i.e., results you get 
back will not be affected by updates to the index after you start 
scrolling. From/size are useful if you want to do paging of search results 
(or infinite scrolling but paged at a time).

8) Filters execute fast and yes can be cached.

On Monday, January 20, 2014 6:21:43 AM UTC-5, Ivan Ji wrote:

 Hi, all

 Recently, I am studying the ElasticSearch. I have several questions about 
 it. Hope someone can answer me.

 (1) About the multi_field, can it store two type of fields ? such as..

 tweet : {
 properties : {
 name : {
 type : multi_field,
 fields : {
 name : { type : string, index : not_analyzed },
 value : { type : int}

 (2) if it can, what's the query format when post a new document? Could I 
 explicit specify the value of these two fields? Or there are some type cast 
 operations inside it?

 (3) Does there any default configuration file exist that configure the 
 default schema mappings of the index and type? Does it only support REST 
 API to create index/configure the mappings?

 (4)After I configured the number of shards/replicas and post many 
 documents into it, can I re-configure it again? And how ? if so, what 
 happened when the shard number increase? Do it cost a lots of performance?

 (5)About the routing, can I control the documents that must be sent to 
 different shards? I know I can use the same routing value to index/search 
 in the same shard. But could I control some documents which must be located 
 in different shards of the other documents?

 (6) Assume I have only one node and one index, what's the difference 
 between the size of shard is only one and ten of the same index? Does it 
 cost extra memory if the shards size is ten? What's the suggested rule to 
 decide this size?

 (7) What's the difference between setting the search_type to scroll and 
 using the parameters(from/size)?

 (8) About the alias filtering, what's the cost about creating a alias 
 filter? Are there any cache algorithms to accelerate these operations using 
 the alias filter? Or it just append the extra filter condition of the 
 filtered alias in the query? 


 Sorry for the newbie questions, could you give me some opinion about these 
 questions?

 Cheers,

 Ivan


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/faf05ddc-566a-4cc8-9488-7a506c154409%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Unexpected behavior from nested - filter - nested aggregation

2014-01-20 Thread John Freeman
First, let me say I'm very excited about the new aggregations. Great work!

I've got a type with two layers of nesting:

script:
  calls: [
name: string
params: [
  name: string
  value: string
]
  ]

I want to run an aggregation over the parameter values for calls to a 
specific function. Here's the skeleton of what I tried:

'aggs': {'b': {
  'nested': {'path': 'calls'},
  'aggs': {'c': {
'filter': {'term': {'calls.name': 'particular_func'}},
'aggs': {'d': {
'nested': {'path': 'calls.params'},
'aggs': ...

The structure is three aggregations: a nested wrapping a filter wrapping a 
nested. Checking the doc counts on these, I see that the outer two work as 
expected: the doc count for the outer nested is the number of nested 
calls documents, the doc count for the filter is the number of those 
nested calls docs that pass the filter. But it appears that the inner 
nested resets the buckets: it returns the number of inner nested params 
documents across all calls docs, regardless of the filter.

Is there a way to do what I want?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/83666871-22f3-4989-9b76-be822d3cf19c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: PagedBytesIndexFieldData cannot be cast to IndexNumericFieldData

2014-01-20 Thread VB
Sure thing here are the detail.

We have type with following mapping with dynamic: strict so other 
datatype data can let go in. Note, it has a field Id with long datatype. 

When we try to get statistical facet on Id it gives error* 
PagedBytesIndexFieldData 
cannot be cast to org.elasticsearch.index.fielddata.IndexNumericFieldData]*

And it happens randomly, once I wipe out index and create it again, it 
works for some time and then all of a sudden it start giving error.

{
portfoliosearch: {
dynamic: strict,
properties: {
Id: {
type: long,
index: not_analyzed,
index_Name: Id
}
Name: {
type: string,
index: not_analyzed,
index_Name: Name
}
},
_routing: {
required: true
},
_parent: {
type: importsetsearch
}
}
}


statistical query

{
  query: {
match_all: {}
  },
  facets: {
Id: {
  statistical: {
field: Id
  }
}
  }
}




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fe471294-43c7-48b8-ab6a-0248f8c1d6a0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


[Ann] JDBC river 1.0.0.RC1.2

2014-01-20 Thread joergpra...@gmail.com
Hi,

just a quick note about a new release of JDBC river.

https://github.com/jprante/elasticsearch-river-jdbc/

Changes

- a series of SQL statements can now be executed at each river cycle
- execution of SQL statements with thread pool size (like connection
pooling)
- river state saved and loaded at each cycle
- schedule parameter with crontab-like specification
- no more oneshot strategy
- poll parameter removed in favor of schedule or interval
- acksql, acksqlparams removed in favor of SQL statement series
- driver parameter removed
- new parameter bulk_flush_timeout
- experimental CallableStatement support improved
- river cycle can be executed at once by new REST river induce command
- new REST river state inspection command
- many bug fixes and cleanups

I will rework and extend the documentation pages in the github wiki the
next days.

Best,

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFGiseH6%3Dig5xcVDcV9o%2BOVeR%3D1c4HYFXT4nsSv85QbeQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


[ANN] ElasticHQ v1.0.0 (ElasticSearch v1.0.0.RC1 Support)

2014-01-20 Thread Roy Russo
Hello All,

Pleased to announce the release of ElasticHQ v1.0.0. 

This release added:


   1. *Support for ElasticSearch v1.0.0RC1* and unbroke the breaking 
   changes. ;-) 
   2. Support for monitoring multiple file systems
   3. Support for G1 GC
   4. Allow user to select which nodes are displayed on the Diagnostics 
   Screen 


*Every HQ release is always backwards compatible*, so there's no magic 
needed on your part. As always, you can get it 
here: http://www.elastichq.org/gettingstarted.html

Regards,
Roy Russo
http://www.elastichq.org/blog/




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8582fbfb-d5a0-4cc4-8310-08fbf7aa0456%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Questions about multi_field, configurations, routing control, filtered alias

2014-01-20 Thread Ivan Ji
Hi Bing,

First, really thanks for your reply. According to the replies, I have few 
questions about it below.

Binh Ly於 2014年1月21日星期二UTC+8上午5時10分10秒寫道:

 Ivan, 


 1) The multi_field type allows you to define different ways that a *single 
 field value* will be indexed. Your example below will work and will index a 
 single value as string/not_analyzed, and then as an int (use integer for 
 int)

 2) The document coming in will contain a field named name with a single 
 value. When it goes into the index, it will be indexed 2 different ways.

 3) A mapping is not required to index data. There is an implied default 
 mapping that will parse your JSON content and dynamically update the schema 
 if you don't specify one up-front.

 4) You cannot change the shard count after the index is created. You can 
 change the replica count anytime. The PUT mapping API allows you to change 
 the replica count.

 5) You can specify a single routing value for all documents that you want 
 to go to a specific shard/location.


Yes, but can I control the two sets of document must be store in 
*different*shards? Because if I use different routing values, does it means it 
can be 
stored in different shard? I guest not, right? Although the hash value of 
these two values are different, I am not sure what the range that the 
routing value belong to a single shard. And I want ti store these documents 
in different shard.
 


 6) The number of shards will allow you to scale your content later. So if 
 your data volume increases, you can add more nodes later and distribute the 
 shards around. If you only have a single shard and you run out of space, 
 then you cannot scale out unless you increase storage, or increase the 
 shard count.

 7) Scroll is used to do a snapshot type of search - i.e., results you get 
 back will not be affected by updates to the index after you start 
 scrolling. From/size are useful if you want to do paging of search results 
 (or infinite scrolling but paged at a time).

 8) Filters execute fast and yes can be cached.


About filters, I want to know the underlying algorithm. If I create an 
alias which represent about half the index, does it increase the index 
size? I mean if I create aliases, does it operate and store some really 
data about it into the storage? or it just remember the condition and 
process like some predefined adapter which cannot store something stored 
data inside the storage?


Another question: 
What's the suggestion if I need to modify the mapping of some index, such 
as from store=no to yes, or remove some field ?
Because after I read these days, it seems hard to change a existed mapping 
and there are much limitation of it.

Again, thanks for your replies.

Cheers,

Ivan

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c581704f-8c66-4151-8816-31065867218b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Return specific field and highlights via Java API

2014-01-20 Thread ZenMaster80
I am having two issues using the java api
1. I am not able to return specific field in my search query - It shows I 
have the right number of results, but displays Null
2. Not return highlights
Note: Assume Indexing is fine, because I am able to get correct results if 
comment out the line .AddField(hid)
 I am using default everything, I understand for highlights _source for 
field has to be enabled, but I thought if not, it grabs the original source.

json:
{uid:'123, name:hello},
{uid:'1234, name:hello1}

node = NodeBuilder.nodeBuilder() //

.local(true)//

.data(true) //

.node();

client = node.client();

   //..createIndex


private void search(String index, String type,String field, String value)

  {

  SearchResponse response = client.prepareSearch(index)

  

  .setTypes(type)

 .addHighlightedField(uid)

 .addField(uid)

 SearchHit[] results = response.getHits().getHits();


System.out.println(Current results:  + results.length);

  for (SearchHit hit : results) {

System.out.println(--);

  MapString,Object result = hit.getSource();  

 System.out.println(result);

}


}


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4984505f-9946-4855-8bf0-5dd11b0a1b21%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Synonym configuration

2014-01-20 Thread paul
Hi ,

I read about in lot of places That There are two approaches when working 
with synonyms :

   - expanding them at indexing time,
   - expanding them at query time.

Expanding synonyms at query time is not recommended since it raises issues 
with :

   - scoring, since synonyms have different document frequencies,
   - multi-token synonyms, since the query parser splits on white spaces.

so to configure expanding synonym ant index time in elastic search what is 
the configuration.
right now my configuration is as below , i am using synonym filter both in 
index analyzer and query analyzer so that means i am expanding index 
time and query time.

name:{
   type:string,
   index_analyzer : autocomplete_index, 
search_analyzer : autocomplete_search
},

{
   settings:{
  analysis:{
 analyzer:{
synonym:{
   tokenizer:whitespace,
   filter:[
  lowercase,
  syns_filter
   ]
},
  autocomplete_search:{
   type:custom,
   tokenizer:whitespace,
   filter:[
  lowercase,
  syns_filter,
  stop
   ]
},
autocomplete_index:{
   type:custom,
   tokenizer:whitespace,
   filter:[
  lowercase,
  syns_filter,
  stop,
  my_edgeNgram
   ]
}
 filter:{
syns_filter:{
   synonyms_path:synonyms/synonym_collegename.txt,
   type:synonym,
   ignore_case:true,
   expand:false
}
}


-paul

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1c21cd11-eb92-47b5-b695-61b33bd256fa%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: facets.total and hits.total dont match

2014-01-20 Thread Chetana
I have indexed some records by making test_field to be 'analyzed'. If the 
analyzed field causing this issue, is there any other facet type/work 
around which can solve the problem?
 

On Tuesday, January 21, 2014 12:15:45 PM UTC+5:30, Chetana wrote:

 I have an application where I need both search results and facet 
 information. Everytime a query is framed based on some filter condition and 
 query words and it is passed to both facet and search request as given 
 below. The field (test_field) on which the facet to be applied is present 
 in all documents. 
  
 BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
 SearchRequestBuilder srb = client.prepareSearch(Test);

 srb.setSearchType(SearchType.DFS_QUERY_THEN_FETCH).setQuery( 
 boolQueryBuilder);
 and 
 TermsFacetBuilder facBuilder = FacetBuilders.termsFacet(test_field);
   facBuilder.facetFilter(FilterBuilders
   .queryFilter(boolQueryBuilder));
   facBuilder.fields(test_field);
   facBuilder.global(true);   // I tried commenting this too, 
 but I get the same result
   srb.addFacet(facBuilder); 
  
 hits : {
 total : 117,
 max_score : null,
 hits : [ {
  }]

   facets : {
 assettype : {
   _type : terms,
   missing : 5,
   total : 119,
   other : 0,
   terms : [ {
 }]
  
 But the hit count is different from the facet count. Can anyone please 
 explain me why this discrepancy?
  
  
 Thanks



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a8ed55c0-6599-4612-995d-28d3340e69f7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Open ports between nodes an main configuration

2014-01-20 Thread Justas Balcas
Hello All,

I have configured elasticsearch with ports 9200 and 9300. Everything works 
fine, as expected, but one configuration I haven`t founded:

elasticsearch 0.90.10
3 indexes, each with 5 nodes without replicas

Then I do 'lsof -i' , I get a huge list with open ports (ESTABLISHED) from 
39619 - to 39634, and on another server running for experiments it makes 
ESTABLISHED connections from these ports : 43010 - 43025.

Is it available to say in configuration, that it would use ports from 9301 
to 9400? O maybe how I could know which ports it would take always ?

Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bedf0990-65a8-4469-aed1-cd9e32af5828%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.