Re: Exception cause unwrapping ran for 10 levels

2014-01-07 Thread Jason Wee
Jörg,

Done, https://github.com/elasticsearch/elasticsearch/issues/4639

Today when I investigated this issue, and just do a query to the time stamp
when the exceptions is happening, data were indexed though. The reason I
query is that, we worry if there is no data index during that period
exceptions are happening , thus data lost.

Thank you.

Jason


On Tue, Jan 7, 2014 at 4:34 PM, joergpra...@gmail.com joergpra...@gmail.com
 wrote:

 Yes, it looks like two nodes do not agree about an update action and a
 version conflict is pinging between them, node1 and node4.

 Not sure if this happens while index recovery or while an update is
 executed, but it is definitely worth raising an issue at the Elasticsearch
 github to let the Elasticsearch core team have a look. It might be some
 kind of a deadlock.

 Jörg

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGKNB-1KXab4eWhDnKpe4szdPsidEWq2his2j%3DfPwU7Zw%40mail.gmail.com
 .

 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHO4itzoqWdujn713RK83ZZL4iGr19nY9nz34wbRtTKOSzcMNA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Sorting by deeply nested filters

2014-01-07 Thread Vesa Marttila
Hi,

I am trying to sort with a twice nested filter, this doesn't seem to work. 
My question is that is this even supposed to be possible? I can provide the 
query if necessary, but it is quite complicated and requires a bit of 
obfuscating.

Sincerely,
Vesa Marttila

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e13c168e-b6c4-42e3-a536-ed9310fc2500%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Query: Parents with at least x children of type y

2014-01-07 Thread Alexander Stautner
Changing the parent doc should be prevented, because there may be new 
child_types added or old child_types may be removed and we want the 
child_types be independed from the parent_type.

So it seems, that there is at the moment no way for doing suchs queries 
with elasticsearch?

Thanks for helping.

Am Dienstag, 7. Januar 2014 10:39:41 UTC+1 schrieb David Pilato:

 I would probably add a num_of_children field in parent doc and update it 
 when a new child is added or removed.

 But I guess it depends on your actual use case!

 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

 Le 7 janv. 2014 à 08:15, Alexander Stautner 
 alexander...@gmail.comjavascript: 
 a écrit :

 Sorry for bumping, but i need an answer, if it's posible to answer the 
 question above with elasticsearch

 Am Donnerstag, 2. Januar 2014 15:22:32 UTC+1 schrieb Alexander Stautner:

 Hello,
 after some research without any results I have a question about 
 parent/child relations.

 The case:

 I have a parent of type parent_type which has children of different 
 types e.g. child_type_1, child_type_2, child_type_3.

 My Question is:

 Is there any possibility to get only the parents which have at least x 
 children of type child_type_2  with an specific value in an attribute.

 e.g

 parent_type: family
 child_type_1: girl attribute:name
 child_type_2: boy attribute:name
 child_type_3: cat attribute:name

 And i want to have all families which have at least three girls with name 
 Jane.


 Thank you for your help,
 Alex 

  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/6622897e-3e72-4db4-b4a0-4d8555c077e8%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6ca23c8a-631b-4d3c-879e-69bb389eef06%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Order results by value in one of the array entries.

2014-01-07 Thread Johan E
Hi,

I'm trying to order the result of a query by a specified entry in a array.

Here is a sample entry


{
product_name: product alfa,
product_id: 4a86c92ccd26111d7ba0eada7da6a75af,
description: This is a sample product,
image_id: product_a.jpg,
inventory: [
{
warehouse: warehouse_a,
stock: 99
},
{
warehouse: warehouse_b,
stock: 19
},
{
warehouse: warehouse_c,
stock: 99
}
]
}

If there were more products containing alfa, I would (for example) want 
to sort they by the stock of a warehouse.

I'm currently using a query like:

POST _search
{
query: {
match: {
product_name:{
query:alfa,
type : phrase
}
}
},
filter: {
bool: {
must: [
   {
   term: {
  availability.warehouse: warehouse_a
   }
   }
]
}
}
}

I would like the results sorted by stock (for warehouse_a only) descending.

Any ideas?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/01a7baad-40e3-40b3-8104-66910762b004%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Sorting by deeply nested filters

2014-01-07 Thread Vesa Marttila
On Tuesday, January 7, 2014 12:15:42 PM UTC+2, Vesa Marttila wrote:

 Hi,

 I am trying to sort with a twice nested filter, this doesn't seem to work. 
 My question is that is this even supposed to be possible? I can provide the 
 query if necessary, but it is quite complicated and requires a bit of 
 obfuscating.

 Sincerely,
 Vesa Marttila


Just to add, the filter when used for queries works as desired, the 
problems only occur when using it in sorting.

Vesa

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2537e5a8-7cb4-4d71-af44-5c7948793641%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: ElasticsearchHadoop Hive integration issue

2014-01-07 Thread Costin Leau

Hi,

The 'es.resource' you specified is incorrect - you need to specify both an 
index and a type - e.g. myIndex/products


P.S. Are you using M1 or the current master - the latter should give a proper 
error (and message).

Thanks,


On 07/01/2014 9:48 AM, Badal Mohapatra wrote:

Hi,

I am trying to index data from hive table to elasticsearch and and using 
the latest elasticsearch-hadoop-master plugin.
My elasticsearch version is 0.90.9 and hive version is hive-0.11.0.

As per the documentation of elasticsearch-hadoop plugin (hive integration), I 
successfully created an external table
with the below command

/CREATE EXTERNAL TABLE es_products (
sku int,rating float,
name string,
type string,
saleprice float,
department string,
manufacturer string,
userid string,
category_name string,
query string)
STORED BY 'org.elasticsearch.hadoop.hive.ESStorageHandler'
TBLPROPERTIES('es.resource' ='products');/

Even though the external table is created
I am not able to either insert data or even query the external table.
When I do a /select * from es_products;/
I get the below exception.

hive select * from es_products;
OK
Failed with exception 
java.io.IOException:java.lang.StringIndexOutOfBoundsException: String index out 
of range: -1
Time taken: 1.699 seconds


Can someone please suggest what / where I am wrong!

Kind Regards,
Badal



--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/dd63310c-dc07-4dc6-9354-69051a05da3f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/52CBDC15.6040307%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Hipchat Elasticsearch

2014-01-07 Thread Ivan Brusic
Here are some related links, including a video of a talk:
http://www.meetup.com/Elasticsearch-San-Francisco/events/141698772/

-- 
Ivan


On Tue, Jan 7, 2014 at 1:43 AM, Ümit Seren uemit.se...@gmail.com wrote:

 Interesting read about elasticsearch in HipChat


 http://highscalability.com/blog/2014/1/6/how-hipchat-stores-and-indexes-billions-of-messages-using-el.html

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/15bcb5d7-b1c6-4499-b0de-041e308f083e%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQA28GpcwC1bWJE%3DOGDyZiQAnsBUKea6DoVs2zvxRjY3pg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Upgrades causing Elastic Search downtime

2014-01-07 Thread Jenny Sivapalan
Hello,

We've upgraded Elastic Search twice over the last month and have 
experienced downtime (roughly 8 minutes) during the roll out. I'm not sure 
if it something we are doing wrong or not.

We use EC2 instances for our Elastic Search cluster and cloud formation to 
manage our stack. When we deploy a new version or change to Elastic Search 
we upload the new artefact, double the number of EC2 instances and wait for 
the new instances to join the cluster.

For example 6 nodes form a cluster on v 0.90.7. We upload the 0.90.9 
version via our deployment process and double the number nodes for the 
cluster (12). The 6 new nodes will join the cluster with the 0.90.9 
version. 

We then want to remove each of the 0.90.7 nodes. We do this by shutting 
down the node (using the plugin head), wait for the cluster to rebalance 
the shards and then terminate the EC2 instances. Then repeat with the next 
node. We leave the master node until last so that it does the re-election 
just once.

The issue we have found in the last two upgrades is that while the 
penultimate node is shutting down the master starts throwing errors and the 
cluster goes red. To fix this we've stopped the Elastic Search process on 
master and have had to restart each of the other nodes (though perhaps they 
would have rebalanced themselves in a longer time period?). We find that we 
send an increase error response to our clients during this time.

We've set out queue size for search to 300 and we start to see the queue 
gets full:
   at java.lang.Thread.run(Thread.java:724)
2014-01-07 15:58:55,508 DEBUG action.search.type[Matt Murdock] 
[92036651] Failed to execute fetch phase
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: 
rejected execution (queue capacity 300) on 
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction$2@23f1bc3
at 
org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:61)
at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
   

But also we see the following error which we've been unable to find the 
diagnosis for:
2014-01-07 15:58:55,530 DEBUG index.shard.service   [Matt Murdock] 
[index-name][4] Can not build 'doc stats' from engine shard state 
[RECOVERING]
org.elasticsearch.index.shard.IllegalIndexShardStateException: 
[index-name][4] CurrentState[RECOVERING] operations only allowed when 
started/relocated
at 
org.elasticsearch.index.shard.service.InternalIndexShard.readAllowed(InternalIndexShard.java:765)

Are we doing anything wrong or has anyone experienced this? 

Thanks,
Jenny

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b2328296-e9c9-4763-b61b-6ad2e145e59b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: score based on term frequency only

2014-01-07 Thread Ivan Brusic
Great feature. However, it looks like it is only available in the master
branch: https://github.com/elasticsearch/elasticsearch/issues/3772

-- 
Ivan


On Tue, Jan 7, 2014 at 8:31 AM, Britta Weber britta.we...@elasticsearch.com
 wrote:

 You could also use a script as described here:


 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html


 Cheers,
 Britta

 On Mon, Jan 6, 2014 at 2:13 AM, Ivan Brusic i...@brusic.com wrote:
  You could provide your own Similarity class as a plugin. Don't have any
  sample code in front of me, but it would be based of  TFIDFSimilarity and
  you would basically needed to ignore the norms and other values.
 
 
 http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
 
  The IDF portion could probably remain since it ranks the different terms
 in
  your query, not the score of each term.
 
  Cheers,
 
  Ivan
 
 
 
  On Sun, Jan 5, 2014 at 1:57 PM, Kevin S kevinste...@gmail.com wrote:
 
  I would like to score based entirely on term count.
 
  For example, given the following two documents:
 
  1) { apple }
 
  2) { apple apple }
 
  Searching apple ranks the first before the second.  I wish to rank the
  second, in which the term occurs twice, with a higher score.
 
  Can someone please point me in the right direction for this?
 
  Thank you.
 
  --
  You received this message because you are subscribed to the Google
 Groups
  elasticsearch group.
  To unsubscribe from this group and stop receiving emails from it, send
 an
  email to elasticsearch+unsubscr...@googlegroups.com.
  To view this discussion on the web visit
 
 https://groups.google.com/d/msgid/elasticsearch/1bb386ae-3ab5-4878-9d29-6462eaff14c7%40googlegroups.com
 .
  For more options, visit https://groups.google.com/groups/opt_out.
 
 
  --
  You received this message because you are subscribed to the Google Groups
  elasticsearch group.
  To unsubscribe from this group and stop receiving emails from it, send an
  email to elasticsearch+unsubscr...@googlegroups.com.
  To view this discussion on the web visit
 
 https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBwEy7UgdqYQmX3EuO71TwSAMCnDp7hdSkcvxLwH5jMJw%40mail.gmail.com
 .
 
  For more options, visit https://groups.google.com/groups/opt_out.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CALhJbBiFtgJOfhBqXkS-%2B2YWnDy81j7c5jaSFEkG%3DVizqTpykg%40mail.gmail.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDAzNoZwdcquTqyB70Kpw4DSPSPZr2fe%3DCUbMORv1pbUQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Scoring and Relative-ness based on Business Rules

2014-01-07 Thread David Mitchell
What is the best way to make products more relevant outside of the default 
scoring?

I have an unknown number of business rules that will dictate a document's 
relativity. Meaning, if one document scores higher than the other, it's 
possible that the other document will be more relevant to the user. 

Given two products with similar titles but different attributes and the 
query ipad, I'd like to promote one over the other:

{
   title_simple: iPad Mini Case,
   description_simple: Royce Leather iPad Mini Case:...,
   category: Computers  Accessories,
   brand : Royce Leather,
   id: 794809052574
}

{
  title_simple: Apple iPad mini (16GB, Wi-Fi + Sprint 4G, White),
  description_simple: iPad mini features a beautiful 7.9\ display...,
  category: Electronics,
  brand : Apple,
  id: 885909689712
}


A simple query scores the iPad case high:

{
   query: { term: { title_simple: ipad }}
}


But business rules dictate that the actual iPad be on the top. 

I can run a filter or score based on the attribute or brand to get what I'm 
looking for:

{
   query: {
  function_score: {
  query: { term: { title_simple: ipad } },
  functions : [{
  filter : { term: { category_simple: electronics } 
},
  boost_factor : 2
  }]  
  }
   }
}

But building a bunch of these isn't scalable or reasonable. 

I have an unknown number of these and that number will continue to grow. 
Some other examples:

- query xbox should promote consoles over games
- query macbook should promote Apple computers over macbook sleeves
- query Apple should promote Apple products and not food

Building a thousand queries based on functions filters is unreasonable and 
unscalable. 

Some possible solutions I've considered:

- building a lookup table that will build the filter portion of the query 
(this could get unmaintainable)
- Including a pre-calculated score in the document (unfortunately, doesn't 
work on a per query basis, as the score may change based on the user's 
needs)
- Extending the DefaultSimilary class (I'm not sure how this helps me in 
this scenario, though)

What have other people done to solve these problems? Is there something 
else that I'm missing that could help?

Here's a runnable gist - 
https://gist.github.com/dlmitchell/826e8fb7ca89bed30e4a/raw/613be2c202b26f5899bdcfeac714737beb49/sample_mapping.sh



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/70849d62-822a-4bb6-99f4-d9400d091fa9%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Too many open files

2014-01-07 Thread Adolfo Rodriguez
Hi, my model is quite slow with just about some thousands documents

I realised that, when opening a

node = 
builder.client(clientOnly).data(!clientOnly).local(local).node();
client = node.client();

from my Java program to ES with such a small model, ES automatically 
creates 10 sockets. Casually I have 10 shards (?).

* Is this the expected behavior?
* Can I reduce the number of ES shards dynamically to reduce the number of 
sockets or should I redeploy my ES install?
* By opening other connections I finally get up to 200 simultaneous open 
sockets and, I am afraid, that, when fetching highlight information, some 
of the results are randomly being lost. Can this missing results be somehow 
as a consequence of a too large number of open sockets?

Thanks for your pointers.

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4c0a4660-ef70-491d-998f-5ed73c4a9025%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Beta2 Java Client: java.nio.channels.UnresolvedAddressException

2014-01-07 Thread davrob2
Hi,

I'm having difficulty connecting with the Java client to 1.0.0.Beta2, the 
cluster is up and health, monitoring is fine using elasticsearch Head, 
elasticsearch HQ etc.

This is the stack trace I am getting:

https://gist.github.com/dav-rob/8304130 

thanks,

David.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6df4a88e-82da-4ef7-ac33-f514e4e50711%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


incrementally scaling ES from the small data

2014-01-07 Thread Adolfo Rodriguez
Hi, I plan to start with a small project, initially, with small data (few 
thousands records) to learn ES response, and, incrementally, increase data 
and resources on demand, to the big data, taking advantage of ES 
scalability.

Is there a document describing such a strategy, i.e.:

* how to properly configure an small basic deployment with good performance 
on low resources? (shards, nodes, clusters...)

* then, how to keep detecting the necessity of incrementally adding 
resources, shard/nodes..., according to increases on data load?


All docs that I find on scaling ES starts on deployments with m/billions of 
records.

Alternatively, any advice on properly configuring ES for the small data? 
(as a starting point?)

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/79926cfe-4365-4a34-895b-70835ae895dc%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Too many open files

2014-01-07 Thread Adolfo Rodriguez
I guess, my problem with excessive number of sockets could be also a 
consequence of having 2 JVM running ES, one embedded in Tomcat, a second 
embedded in other Java app, as said here:

https://groups.google.com/forum/?hl=en-GB#!topicsearchin/elasticsearch/scale%7Csort:date%7Cspell:true/elasticsearch/m9IWpGzoLLE

Is there any experience running an unique embedded ES (as jar files), for 
example, in tomcat's lib folder, being consumed by several tomcat apps and 
other standalone apps in different JVMs?

Any opinion on this configuration as an starting point?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3ba7b377-9b66-4d8b-ad65-de362318f9f2%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Scoring and Relative-ness based on Business Rules

2014-01-07 Thread Justin Treher
I think you will find that for small documents, that aren't actually 
documents at all, but really a mass of data points, such as a product 
library, you won't even use the built in scoring at all. The built in 
scoring works well for books and articles (long works of text). For a 
product library, you will use an array of custom boosts through the 
function score query. The key is to get all those data points in your 
documents so that you can boost on matches.

For example, with xbox, you could have a keywords field that includes 
xbox just for consoles. Maybe Xbox is the title of the product while games 
just have Xbox listed as their console compatibility. Only matches in the 
titles will score higher. 

For the macbook, you could have an accessories flag where items flagged as 
an accessory receive a negative boost. 

For Apple food vs. Apple products, you can use sales data or user history. 

The key to having relevancy that works for your organization is by 
providing all the data points to elasticsearch to base its decisions. For 
products, your best solution is a big old set of constant score queries 
wrapped in some wild function score queries.

On Tuesday, January 7, 2014 12:36:43 PM UTC-5, David Mitchell wrote:

 What is the best way to make products more relevant outside of the default 
 scoring?

 I have an unknown number of business rules that will dictate a document's 
 relativity. Meaning, if one document scores higher than the other, it's 
 possible that the other document will be more relevant to the user. 

 Given two products with similar titles but different attributes and the 
 query ipad, I'd like to promote one over the other:

 {
title_simple: iPad Mini Case,
description_simple: Royce Leather iPad Mini Case:...,
category: Computers  Accessories,
brand : Royce Leather,
id: 794809052574
 }

 {
   title_simple: Apple iPad mini (16GB, Wi-Fi + Sprint 4G, White),
   description_simple: iPad mini features a beautiful 7.9\ display...,
   category: Electronics,
   brand : Apple,
   id: 885909689712
 }


 A simple query scores the iPad case high:

 {
query: { term: { title_simple: ipad }}
 }


 But business rules dictate that the actual iPad be on the top. 

 I can run a filter or score based on the attribute or brand to get what 
 I'm looking for:

 {
query: {
   function_score: {
   query: { term: { title_simple: ipad } },
   functions : [{
   filter : { term: { category_simple: electronics 
 } },
   boost_factor : 2
   }]  
   }
}
 }

 But building a bunch of these isn't scalable or reasonable. 

 I have an unknown number of these and that number will continue to grow. 
 Some other examples:

 - query xbox should promote consoles over games
 - query macbook should promote Apple computers over macbook sleeves
 - query Apple should promote Apple products and not food

 Building a thousand queries based on functions filters is unreasonable and 
 unscalable. 

 Some possible solutions I've considered:

 - building a lookup table that will build the filter portion of the query 
 (this could get unmaintainable)
 - Including a pre-calculated score in the document (unfortunately, doesn't 
 work on a per query basis, as the score may change based on the user's 
 needs)
 - Extending the DefaultSimilary class (I'm not sure how this helps me in 
 this scenario, though)

 What have other people done to solve these problems? Is there something 
 else that I'm missing that could help?

 Here's a runnable gist - 
 https://gist.github.com/dlmitchell/826e8fb7ca89bed30e4a/raw/613be2c202b26f5899bdcfeac714737beb49/sample_mapping.sh





-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/48fb3984-a23c-4d95-aa34-e8e67dce8df9%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Transport Client hangs in my web application during search.

2014-01-07 Thread Search User
I have a web application in which I create a Transport Client using Spring 
(singleton) and inject it into my service. When I receive a request in my 
controller, controller calls the service and service uses the transport 
client to execute the query and return the results. When I deploy this 
application in tomcat, I have the client created but when I execute the 
query, client hangs. 

If I create the client for every request (in my service) and run the query, 
everything is fine. Can some one help me understand this behavior?

Following is my code to create the Client object.

Settings settings = ImmutableSettings.settingsBuilder().put(cluster.name, 
mysearchcluster).put(client.transport.sniff, true).build();
Client client = new TransportClient(settings).addTransportAddress(new 
InetSocketTransportAddress(10.150.200.101, 9300));



Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4c846ec4-15c5-4c6f-9e1c-6c56912cc2ee%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Scoring and Relative-ness based on Business Rules

2014-01-07 Thread David Mitchell
Thanks for your answer. 

So, instead of relying on queries to pull out the right stuff, you're 
suggesting to model the documents to the queries. 

This suggests that there's a custom boost for every search term, which is 
what I was hoping to avoid, if only because of the impossible task of going 
through all our data and determining what to boost/not boost. This also 
implies that there's another key/value store of queries-to-boost keywords, 
which again could get costly to maintain. 

If I'm understanding you correctly, it would look similar to what I 
previously posted, but only with a larger (possibly dynamic) set of boost 
queries. 

Doing so is primarily a manual task - are there more automatic ways to 
build up relevancy, or even tools/processes that help? 

On Tuesday, January 7, 2014 11:50:40 AM UTC-8, Justin Treher wrote:

 I think you will find that for small documents, that aren't actually 
 documents at all, but really a mass of data points, such as a product 
 library, you won't even use the built in scoring at all. The built in 
 scoring works well for books and articles (long works of text). For a 
 product library, you will use an array of custom boosts through the 
 function score query. The key is to get all those data points in your 
 documents so that you can boost on matches.

 For example, with xbox, you could have a keywords field that includes 
 xbox just for consoles. Maybe Xbox is the title of the product while games 
 just have Xbox listed as their console compatibility. Only matches in the 
 titles will score higher. 

 For the macbook, you could have an accessories flag where items flagged as 
 an accessory receive a negative boost. 

 For Apple food vs. Apple products, you can use sales data or user history. 

 The key to having relevancy that works for your organization is by 
 providing all the data points to elasticsearch to base its decisions. For 
 products, your best solution is a big old set of constant score queries 
 wrapped in some wild function score queries.

 On Tuesday, January 7, 2014 12:36:43 PM UTC-5, David Mitchell wrote:

 What is the best way to make products more relevant outside of the 
 default scoring?

 I have an unknown number of business rules that will dictate a document's 
 relativity. Meaning, if one document scores higher than the other, it's 
 possible that the other document will be more relevant to the user. 

 Given two products with similar titles but different attributes and the 
 query ipad, I'd like to promote one over the other:

 {
title_simple: iPad Mini Case,
description_simple: Royce Leather iPad Mini Case:...,
category: Computers  Accessories,
brand : Royce Leather,
id: 794809052574
 }

 {
   title_simple: Apple iPad mini (16GB, Wi-Fi + Sprint 4G, White),
   description_simple: iPad mini features a beautiful 7.9\ display...
 ,
   category: Electronics,
   brand : Apple,
   id: 885909689712
 }


 A simple query scores the iPad case high:

 {
query: { term: { title_simple: ipad }}
 }


 But business rules dictate that the actual iPad be on the top. 

 I can run a filter or score based on the attribute or brand to get what 
 I'm looking for:

 {
query: {
   function_score: {
   query: { term: { title_simple: ipad } },
   functions : [{
   filter : { term: { category_simple: electronics 
 } },
   boost_factor : 2
   }]  
   }
}
 }

 But building a bunch of these isn't scalable or reasonable. 

 I have an unknown number of these and that number will continue to grow. 
 Some other examples:

 - query xbox should promote consoles over games
 - query macbook should promote Apple computers over macbook sleeves
 - query Apple should promote Apple products and not food

 Building a thousand queries based on functions filters is unreasonable 
 and unscalable. 

 Some possible solutions I've considered:

 - building a lookup table that will build the filter portion of the query 
 (this could get unmaintainable)
 - Including a pre-calculated score in the document (unfortunately, 
 doesn't work on a per query basis, as the score may change based on the 
 user's needs)
 - Extending the DefaultSimilary class (I'm not sure how this helps me in 
 this scenario, though)

 What have other people done to solve these problems? Is there something 
 else that I'm missing that could help?

 Here's a runnable gist - 
 https://gist.github.com/dlmitchell/826e8fb7ca89bed30e4a/raw/613be2c202b26f5899bdcfeac714737beb49/sample_mapping.sh





-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/79abb91e-1be3-430a-b23d-a1582fae525b%40googlegroups.com.
For more options, visit 

Any fix timeline for split brain issue: 2488

2014-01-07 Thread bitsofinfo . g
Hi, is there any timeline on a fix 
for https://github.com/elasticsearch/elasticsearch/issues/2488 ?

thanks!

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/79fc8f45-08f5-4abc-9349-06b23debc3a2%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Match exact substring in not analyzed field

2014-01-07 Thread InquiringMind
This is an interesting problem. Typically, my view of stop words is dim. I 
would prefer that the client side avoids searching on them if that is 
desired, rather than the engine ignores them. Then, phrase matching can 
work properly. And queries such as The Wall can look for just Wall(ignoring 
The as a stop word), but then the Google-like +The Wall can look for The 
Wall. Yeah, I know that ES is not Google; I only look to Google for ideas 
that are nice and for hints about their implementation based upon their 
external behavior.

Then, your problem could be solved using a phrase query with no slop.

Maybe your testMulti field is analyzed but no stop words are ignored. Or, 
maybe testMulti.raw is analyzed but with no stop words ignored. Either way, 
you'd have the full set of words indexed for a phrase query to quickly find 
the sub-match. At least, much, much more quickly than a grep-style wildcard 
search against a non-analyzed form of the field.

I also used phrases within my own table-based synonym matching. Instead of 
using ES synonyms, I create a separate type with lists of synonyms. A query 
for a synonym is first directed to that type to fetch a list of synonyms; 
then an OR query is generated. This has proven to be fast enough. It has 
the benefit of allowing the synonyms to be updated with no changes to the 
97-millon documents that are already indexed. And, synonyms can be phrases, 
for example: HUGE - VERY BIG. So now a synonym query for HUGE can find The 
Very Big Dog. Likewise, a synonym query for the phrase VERY BIG can find The 
Huge Dog. Really cool; just a matter of Java coding on the front end. And 
ES does the heavy lifting underneath. But I digress a little...

Hope this helps.

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5440531a-2ccc-4df1-9edb-422012f7dd3b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: incrementally scaling ES from the small data

2014-01-07 Thread Mark Walkom
As a really, really rough guide;
Start with a small instance, 4-8G RAM (2-4G heap). Keep loading documents
until things start to slow down (ie query/update responsiveness drops). Add
a new node.
Rinse and repeat.

If you have one node there is no point using replicas as they have nowhere
to go. You can easily add replicas later though so it's no big deal.
Shards is a little harder, start with the standard/default of 8 shards and
go from there. Using aliases can allow you to reindex your data later if
you feel you may want to change this.

You can monitor your cluster with a range of monitoring plugins -
elasticHQ, kopf, elasticsearch-monitoring, bigdesk. Just search for them on
github.


As Boaz mentioned, it really does depend on what you are doing. Chances are
you will go through all this and get to a point where you want to rebuild
your cluster with all your gained knowledge!

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 8 January 2014 09:18, Boaz Leskes b.les...@gmail.com wrote:

 Hi Adolfo,

 The best way to scale depends on your data and how it behaves. You can
 watch this great talk by Shay about two use cases to get inspired:
 http://www.elasticsearch.org/videos/big-data-search-and-analytics/

 Cheers,
 Boaz


 On Tuesday, January 7, 2014 8:13:18 PM UTC+1, Adolfo Rodriguez wrote:

 Hi, I plan to start with a small project, initially, with small data (few
 thousands records) to learn ES response, and, incrementally, increase data
 and resources on demand, to the big data, taking advantage of ES
 scalability.

 Is there a document describing such a strategy, i.e.:

 * how to properly configure an small basic deployment with good
 performance on low resources? (shards, nodes, clusters...)

 * then, how to keep detecting the necessity of incrementally adding
 resources, shard/nodes..., according to increases on data load?


 All docs that I find on scaling ES starts on deployments with m/billions
 of records.

 Alternatively, any advice on properly configuring ES for the small
 data? (as a starting point?)

 Thanks

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/3d444d6f-fa0d-4567-a46b-538ea9b379f9%40googlegroups.com
 .

 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZRacXqWCg56kFvjYsf1_cDxLT4Drhdbk6jFL5_Q1EekA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: incrementally scaling ES from the small data

2014-01-07 Thread Adolfo Rodriguez
Thanks both for your comments.

Shards is a little harder, start with the* standard/default of 8 shards*and go 
from there.


* This is the point that is confusing me the most. For a very small initial 
deployment, with a few thousand docs, why not using just define 1 shard 
with no replica? What criteria you used to set 8 shards as a default (BTW, 
defaults - in ES 0.90.5 - are 5 Successful Shards, 5 Unassigned Shards, is 
not it?).

* Suppose that you start with the smaller minimum setup: 1 cluster, 1 node, 
1 shard, no replica, Will I be able to incrementally scale any of these 
settings up? And will I able also to scale any of these settings down 
after? (or will need to repopulate ES in any particular case). The idea is 
testing different configs.

* In my current particular case, can I scale down my current 5 shards/1 
replica (default 0.90.5 AFAIK) to 1 shard/no replica? And start from there?

The reason I am concerned about this is that I see lot of sockets (maybe 
200 hundreds on my system - 2 ES on different apps in same machine - and 
want to understand where they come from and how to allocate the optimum). I 
watched Shai's presentation yesterday but could no grasp this info.

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4e8b513f-42a0-45e7-b677-842876c2570b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: incrementally scaling ES from the small data

2014-01-07 Thread Ivan Brusic
Elasticsearch uses consistent hashing, so you cannot change the number of
shards for an index.

If you can reindex data, then you can create a new index with a different
number of shards and simply reindex. If your data is temporal in nature,
you can create a new index per day/week/month and these new indices can
have a different shard value. You can search against multiple indices even
if they have different shard values.

IMHO, shard values in the high single digits (5-10) is a great starting
point. Even with a single node cluster, the default number of shards (5)
should not cause any performance issues.

Cheers,

Ivan


On Tue, Jan 7, 2014 at 4:47 PM, Adolfo Rodriguez pellyado...@yahoo.eswrote:

 Thanks both for your comments.

 Shards is a little harder, start with the* standard/default of 8 shards*and 
 go from there.


 * This is the point that is confusing me the most. For a very small
 initial deployment, with a few thousand docs, why not using just define 1
 shard with no replica? What criteria you used to set 8 shards as a default
 (BTW, defaults - in ES 0.90.5 - are 5 Successful Shards, 5 Unassigned
 Shards, is not it?).

 * Suppose that you start with the smaller minimum setup: 1 cluster, 1
 node, 1 shard, no replica, Will I be able to incrementally scale any of
 these settings up? And will I able also to scale any of these settings down
 after? (or will need to repopulate ES in any particular case). The idea is
 testing different configs.

 * In my current particular case, can I scale down my current 5 shards/1
 replica (default 0.90.5 AFAIK) to 1 shard/no replica? And start from there?

 The reason I am concerned about this is that I see lot of sockets (maybe
 200 hundreds on my system - 2 ES on different apps in same machine - and
 want to understand where they come from and how to allocate the optimum). I
 watched Shai's presentation yesterday but could no grasp this info.

 Thanks

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/4e8b513f-42a0-45e7-b677-842876c2570b%40googlegroups.com
 .

 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDzAdvA1mNk%2BBUb-4N5mPayP9MCBXm%2BONsptYhnBOhFgA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Design practices for hosting multiple clusters/on-demand cluster creation?

2014-01-07 Thread Josh Harrison
While ES is still in a pre deployment stage at my job, there is growing 
interest in it. For various reasons, a monster cluster holding everyone's 
stuff is simply not possible. Individual projects require complete control 
over their data and the culture and security requirements here are such 
that doing something like always naming project 1's indexes 
PROJECT_1_something will not fly.
We have a fairly beefy hadoop cluster hosting our content currently, along 
with a separate head node acting as the master.
In this situation, is it simply a matter of starting up new processes on 
each node pointed at different configuration profiles and tying specific 
ports to specific projects/clusters?

Basically, is there an established way to build on-demand clusters, given a 
set of resources? We'll layer something in front of it to deal with access 
control/etc.

Thanks!
-Josh

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ad2695f7-d1a2-4036-82b2-58bddf349681%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: incrementally scaling ES from the small data

2014-01-07 Thread Adolfo Rodriguez
Thanks Ivan,
 

 Elasticsearch uses consistent hashing, so you cannot change the number of 
 shards for an index.


So, I understand that, once the index is created, is only possible to 
scale, up and down, nodes, clusters and replicas. But no shards. 
Interesting.
 

 IMHO, shard values in the high single digits (5-10) is a *great starting 
 point.* Even with a single node cluster, the default number of shards (5) 
 should not cause any performance issues.


I am worried about the 200 hundred established sockets in my machine 
(running 2 ES) since I suspect they are producing me some random data lose 
on getting highlighting information. And I was wondering if setting just 1 
shard/0 replica on each ES would get rid of these unwanted sockets (?). Why 
is advised to start with (5-10) rather than with (1-0) * 2 ES ? Any reason?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/42c801d9-83ac-4096-b148-f973dadaeb1e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: incrementally scaling ES from the small data

2014-01-07 Thread Ivan Brusic
An increase of shards will not cause an increase in sockets used. Each node
shard action is responsible for gather the responses from each shard at the
file-level before sending the response back to the client.

Since each shard is actually its own Lucene index, an increase of shards
will increase metrics at the IO level, especially the number of open file
descriptors.

It is advised to start of with 5 because that would allow you to scale an
index horizontally without needing to reindex. You can increase your
cluster from 1 to 5 and each node will have a piece of the index instead of
the entire index that. Beyond that number, you can distribute the index
with more replicas. More shards increase availability IMHO. Ultimately you
do not want large shards for performance reasons.

-- 
Ivan


On Tue, Jan 7, 2014 at 5:23 PM, Adolfo Rodriguez pellyado...@yahoo.eswrote:


 I am worried about the 200 hundred established sockets in my machine
 (running 2 ES) since I suspect they are producing me some random data lose
 on getting highlighting information. And I was wondering if setting just 1
 shard/0 replica on each ES would get rid of these unwanted sockets (?). Why
 is advised to start with (5-10) rather than with (1-0) * 2 ES ? Any reason?



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDcRQsnr_WONKAcu8QWiroHabhfD9spLKk2qcqatTfgrQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


How to index an existing json file

2014-01-07 Thread ZenMaster80
Hi,

I am just starting with ElasticSearch, I would like to know how to index a 
simple json document books.json that has the following in it: Where do I 
place the document? I placed it in root directory of elastic search and in 
/bin folder..

{“books”:[{“name”:”life in heaven”,”author”:”Mike Smith”},{“name”:”get 
rich”,”author”:”Joe Shmoe”},{“name”:”luxury properties”,”author”:”Linda 
Jones”]}}


$ curl -XPUT http://localhost:9200/books/book/1; -d @books.json

Warning: Couldn't read data from file books.json, this makes an empty 
POST.

{error:MapperParsingException[failed to parse, document is 
empty],status:400}


Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a5c1e37f-9472-499c-9499-1475c944f47b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: How to index an existing json file

2014-01-07 Thread Ivan Brusic
The JSON file is used by the curl command, so in your example it should be
in the same directory in which you executed the command (current directory).

-- 
Ivan


On Tue, Jan 7, 2014 at 6:00 PM, ZenMaster80 sabdall...@gmail.com wrote:

 Hi,

 I am just starting with ElasticSearch, I would like to know how to index a
 simple json document books.json that has the following in it: Where do I
 place the document? I placed it in root directory of elastic search and in
 /bin folder..

 {“books”:[{“name”:”life in heaven”,”author”:”Mike Smith”},{“name”:”get
 rich”,”author”:”Joe Shmoe”},{“name”:”luxury properties”,”author”:”Linda
 Jones”]}}


 $ curl -XPUT http://localhost:9200/books/book/1; -d @books.json

 Warning: Couldn't read data from file books.json, this makes an empty
 POST.

 {error:MapperParsingException[failed to parse, document is
 empty],status:400}


 Thanks

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/a5c1e37f-9472-499c-9499-1475c944f47b%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDg%3Du3HfBvKnQrCy6XEJ6knyrvx042j8kn7YZmMz96FhA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: How to index an existing json file

2014-01-07 Thread ZenMaster80
Great, Do you know why I am getting  

{error:MapperParsingException[failed to parse]; nested: 
JsonParseException[Unrecognized token 'life': was expecting ('true', 
'false' or 'null')\n at [Source: [B@5c9a9d06; line: 1, column: 35]]; 
,status:400}

data:

{“books”:[{“name”:”life in heaven”,”author”:”Mike Smith”},{“name”:”get 
rich”,”author”:”Joe Shmoe”},{“name”:”luxury properties”,”author”:”Linda 
Jones”}]}



On Tuesday, January 7, 2014 9:06:01 PM UTC-5, Ivan Brusic wrote:

 The JSON file is used by the curl command, so in your example it should be 
 in the same directory in which you executed the command (current directory).

 -- 
 Ivan


 On Tue, Jan 7, 2014 at 6:00 PM, ZenMaster80 sabda...@gmail.comjavascript:
  wrote:

 Hi,

 I am just starting with ElasticSearch, I would like to know how to index 
 a simple json document books.json that has the following in it: Where do 
 I place the document? I placed it in root directory of elastic search and 
 in /bin folder..

 {“books”:[{“name”:”life in heaven”,”author”:”Mike Smith”},{“name”:”get 
 rich”,”author”:”Joe Shmoe”},{“name”:”luxury properties”,”author”:”Linda 
 Jones”]}}


 $ curl -XPUT http://localhost:9200/books/book/1; -d @books.json

 Warning: Couldn't read data from file books.json, this makes an empty 
 POST.

 {error:MapperParsingException[failed to parse, document is 
 empty],status:400}


 Thanks

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/a5c1e37f-9472-499c-9499-1475c944f47b%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5d15fcdf-4a0f-4d92-9dd3-f07899d915fe%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


How many metadata fields exist of MP3 file ?

2014-01-07 Thread HongXuan Ji
Hi all,

I am wondering how many metadata fields of MP3 files exist when I post the 
mp3 file into ElasticSearch using the mapper-attachment. 

Because in Solr we can know the field information through the endpoint 
SOLR_HOST/update/extract?extractOnly=true, 

but in ElasticSearch are there any ways to get such informations?  Except 
for the MP3 files, how about the doc files? 

I know the ElasticSearch use tika to support this operations, can you give 
me some example to fetch some special field of some special file format?

Regards,

Ivan 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/742f86b9-9dd8-4354-ae50-26332f0c4dc0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Too many open files

2014-01-07 Thread Adolfo Rodriguez
Happily, the problem of missing highlight records looks to be gone by 
making a config change.

* Initially I had 2 ES in 2 different apps (a Tomcat and a standalone) 
configured equal (both listening for incoming TransportClients requests on 
port 9300 and both open with client(false)) and a third ES connecting to 
then opened with new TransportClient() to fetch highlighting info. It looks 
that this third ES was randomly loosing highlighting records. (?)

* What I did to fix it was a configuration change to have only one 
client(false)) ES listening for TransportClients and 2 new 
TransportClient()s connecting to it.

It looks this change fixes the issue which was some kind of coupling 
between both client(false)) ESs listening on port 9300.

Regards

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fbe72b9f-eeac-4d2b-9545-6851352aa3d5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: incrementally scaling ES from the small data

2014-01-07 Thread Adolfo Rodriguez
Thanks Ivan, makes sense. Still could not test how sockets relate to shards 
and why I automatically get 10 established sockets when opening a client:

node = builder.client(clientOnly).data(!clientOnly).local(local).node();

client = node.client();


on default ES configuration, and many many more sockets after (up to 200), 
and how this number changes when increasing/decreasing number of shards, 

but happily I managed to fix the initial issue of highlighting info being 
randomly lost by a config change as described here:

https://groups.google.com/d/msg/elasticsearch/3t6UL_vzM7o/TLnV2m2B1NAJ


so sockets does not look an issue anymore. 

Regards.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8a65007d-1053-4842-9c6b-93564b3ec44f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Transport Client hangs in my web application during search.

2014-01-07 Thread Jason Wee
Does it show anything in the log? Perhaps try catch block on your code and
set a query timeout.

HTH

/Jason


On Wed, Jan 8, 2014 at 4:41 AM, Search User feedwo...@gmail.com wrote:

 I have a web application in which I create a Transport Client using Spring
 (singleton) and inject it into my service. When I receive a request in my
 controller, controller calls the service and service uses the transport
 client to execute the query and return the results. When I deploy this
 application in tomcat, I have the client created but when I execute the
 query, client hangs.

 If I create the client for every request (in my service) and run the
 query, everything is fine. Can some one help me understand this behavior?

 Following is my code to create the Client object.

 Settings settings = ImmutableSettings.settingsBuilder().put(cluster.name
 , mysearchcluster).put(client.transport.sniff, true).build();
 Client client = new TransportClient(settings).addTransportAddress(new
 InetSocketTransportAddress(10.150.200.101, 9300));



 Thanks

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/4c846ec4-15c5-4c6f-9e1c-6c56912cc2ee%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHO4itxM795Xuo8tikF4oADgYH50R58Y8B0qwdMz4nU82koN3w%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: How to index an existing json file

2014-01-07 Thread David Pilato
Start with a clean index:

curl -XDELETE http://localhost:9200/books/;

You probably have a bad mapping (some docs already indexed?)

If you still have problems, please gist a full curl recreation. See 
http://www.elasticsearch.org/help/


--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 8 janv. 2014 à 03:10, ZenMaster80 sabdall...@gmail.com a écrit :

Great, Do you know why I am getting 
{error:MapperParsingException[failed to parse]; nested: 
JsonParseException[Unrecognized token 'life': was expecting ('true', 'false' or 
'null')\n at [Source: [B@5c9a9d06; line: 1, column: 35]]; ,status:400}

data:

{“books”:[{“name”:”life in heaven”,”author”:”Mike Smith”},{“name”:”get 
rich”,”author”:”Joe Shmoe”},{“name”:”luxury properties”,”author”:”Linda 
Jones”}]}




 On Tuesday, January 7, 2014 9:06:01 PM UTC-5, Ivan Brusic wrote:
 The JSON file is used by the curl command, so in your example it should be in 
 the same directory in which you executed the command (current directory).
 
 -- 
 Ivan
 
 
 On Tue, Jan 7, 2014 at 6:00 PM, ZenMaster80 sabda...@gmail.com wrote:
 Hi,
 
 I am just starting with ElasticSearch, I would like to know how to index a 
 simple json document books.json that has the following in it: Where do I 
 place the document? I placed it in root directory of elastic search and in 
 /bin folder..
 
 {“books”:[{“name”:”life in heaven”,”author”:”Mike Smith”},{“name”:”get 
 rich”,”author”:”Joe Shmoe”},{“name”:”luxury properties”,”author”:”Linda 
 Jones”]}}
 
 
 $ curl -XPUT http://localhost:9200/books/book/1; -d @books.json
 
 Warning: Couldn't read data from file books.json, this makes an empty POST.
 {error:MapperParsingException[failed to parse, document is 
 empty],status:400}
 
 
 
 Thanks
 
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/a5c1e37f-9472-499c-9499-1475c944f47b%40googlegroups.com.
 For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5d15fcdf-4a0f-4d92-9dd3-f07899d915fe%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/E9FE0784-B10E-48AD-9C46-45B44B1513B9%40pilato.fr.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Transport Client hangs in my web application during search.

2014-01-07 Thread David Pilato
Your code looks good to me.
Don't create multiple client but only one for your whole application.

As Jason wrote, look at logs.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 8 janv. 2014 à 07:40, Jason Wee peich...@gmail.com a écrit :

Does it show anything in the log? Perhaps try catch block on your code and set 
a query timeout. 

HTH 

/Jason


 On Wed, Jan 8, 2014 at 4:41 AM, Search User feedwo...@gmail.com wrote:
 I have a web application in which I create a Transport Client using Spring 
 (singleton) and inject it into my service. When I receive a request in my 
 controller, controller calls the service and service uses the transport 
 client to execute the query and return the results. When I deploy this 
 application in tomcat, I have the client created but when I execute the 
 query, client hangs. 
 
 If I create the client for every request (in my service) and run the query, 
 everything is fine. Can some one help me understand this behavior?
 
 Following is my code to create the Client object.
 
 Settings settings = ImmutableSettings.settingsBuilder().put(cluster.name, 
 mysearchcluster).put(client.transport.sniff, true).build();
 Client client = new TransportClient(settings).addTransportAddress(new 
 InetSocketTransportAddress(10.150.200.101, 9300));
 
 
 
 Thanks
 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/4c846ec4-15c5-4c6f-9e1c-6c56912cc2ee%40googlegroups.com.
 For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHO4itxM795Xuo8tikF4oADgYH50R58Y8B0qwdMz4nU82koN3w%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/FD569ACE-1811-4FEC-AFDC-7DA96A621B61%40pilato.fr.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Design practices for hosting multiple clusters/on-demand cluster creation?

2014-01-07 Thread David Pilato
You could look at chef cookbook: 
https://github.com/elasticsearch/cookbook-elasticsearch
http://www.elasticsearch.org/tutorials/deploying-elasticsearch-with-chef-solo/

Does it help?

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


Le 8 janv. 2014 à 02:01, Josh Harrison hij...@gmail.com a écrit :

While ES is still in a pre deployment stage at my job, there is growing 
interest in it. For various reasons, a monster cluster holding everyone's stuff 
is simply not possible. Individual projects require complete control over their 
data and the culture and security requirements here are such that doing 
something like always naming project 1's indexes PROJECT_1_something will not 
fly.
We have a fairly beefy hadoop cluster hosting our content currently, along with 
a separate head node acting as the master.
In this situation, is it simply a matter of starting up new processes on each 
node pointed at different configuration profiles and tying specific ports to 
specific projects/clusters?

Basically, is there an established way to build on-demand clusters, given a set 
of resources? We'll layer something in front of it to deal with access 
control/etc.

Thanks!
-Josh
-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ad2695f7-d1a2-4036-82b2-58bddf349681%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/535D6769-0469-4BF8-9840-C67FA81CFD89%40pilato.fr.
For more options, visit https://groups.google.com/groups/opt_out.