date:20140107

Re: Exception cause unwrapping ran for 10 levels

2014-01-07 Thread Jason Wee

Jörg,

Done, https://github.com/elasticsearch/elasticsearch/issues/4639

Today when I investigated this issue, and just do a query to the time stamp
when the exceptions is happening, data were indexed though. The reason I
query is that, we worry if there is no data index during that period
exceptions are happening , thus data lost.

Thank you.

Jason

On Tue, Jan 7, 2014 at 4:34 PM, joergpra...@gmail.com joergpra...@gmail.com
wrote:

Yes, it looks like two nodes do not agree about an update action and a
version conflict is pinging between them, node1 and node4.

Not sure if this happens while index recovery or while an update is
executed, but it is definitely worth raising an issue at the Elasticsearch
github to let the Elasticsearch core team have a look. It might be some
kind of a deadlock.

Jörg

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGKNB-1KXab4eWhDnKpe4szdPsidEWq2his2j%3DfPwU7Zw%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHO4itzoqWdujn713RK83ZZL4iGr19nY9nz34wbRtTKOSzcMNA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Sorting by deeply nested filters

2014-01-07 Thread Vesa Marttila

Hi,

I am trying to sort with a twice nested filter, this doesn't seem to work. 
My question is that is this even supposed to be possible? I can provide the 
query if necessary, but it is quite complicated and requires a bit of 
obfuscating.

Sincerely,
Vesa Marttila

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e13c168e-b6c4-42e3-a536-ed9310fc2500%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Query: Parents with at least x children of type y

2014-01-07 Thread Alexander Stautner

Changing the parent doc should be prevented, because there may be new
child_types added or old child_types may be removed and we want the
child_types be independed from the parent_type.

So it seems, that there is at the moment no way for doing suchs queries
with elasticsearch?

Thanks for helping.

Am Dienstag, 7. Januar 2014 10:39:41 UTC+1 schrieb David Pilato:

I would probably add a num_of_children field in parent doc and update it
when a new child is added or removed.

But I guess it depends on your actual use case!

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 7 janv. 2014 à 08:15, Alexander Stautner
alexander...@gmail.comjavascript:
a écrit :

Sorry for bumping, but i need an answer, if it's posible to answer the
question above with elasticsearch

Am Donnerstag, 2. Januar 2014 15:22:32 UTC+1 schrieb Alexander Stautner:

Hello,
after some research without any results I have a question about
parent/child relations.

The case:

I have a parent of type parent_type which has children of different
types e.g. child_type_1, child_type_2, child_type_3.

My Question is:

Is there any possibility to get only the parents which have at least x
children of type child_type_2 with an specific value in an attribute.

e.g

parent_type: family
child_type_1: girl attribute:name
child_type_2: boy attribute:name
child_type_3: cat attribute:name

And i want to have all families which have at least three girls with name
Jane.

Thank you for your help,
Alex

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com javascript:.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6622897e-3e72-4db4-b4a0-4d8555c077e8%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6ca23c8a-631b-4d3c-879e-69bb389eef06%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Order results by value in one of the array entries.

2014-01-07 Thread Johan E

Hi,

I'm trying to order the result of a query by a specified entry in a array.

Here is a sample entry


{
product_name: product alfa,
product_id: 4a86c92ccd26111d7ba0eada7da6a75af,
description: This is a sample product,
image_id: product_a.jpg,
inventory: [
{
warehouse: warehouse_a,
stock: 99
},
{
warehouse: warehouse_b,
stock: 19
},
{
warehouse: warehouse_c,
stock: 99
}
]
}

If there were more products containing alfa, I would (for example) want 
to sort they by the stock of a warehouse.

I'm currently using a query like:

POST _search
{
query: {
match: {
product_name:{
query:alfa,
type : phrase
}
}
},
filter: {
bool: {
must: [
   {
   term: {
  availability.warehouse: warehouse_a
   }
   }
]
}
}
}

I would like the results sorted by stock (for warehouse_a only) descending.

Any ideas?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/01a7baad-40e3-40b3-8104-66910762b004%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Sorting by deeply nested filters

2014-01-07 Thread Vesa Marttila

On Tuesday, January 7, 2014 12:15:42 PM UTC+2, Vesa Marttila wrote:

 Hi,

 I am trying to sort with a twice nested filter, this doesn't seem to work. 
 My question is that is this even supposed to be possible? I can provide the 
 query if necessary, but it is quite complicated and requires a bit of 
 obfuscating.

 Sincerely,
 Vesa Marttila


Just to add, the filter when used for queries works as desired, the 
problems only occur when using it in sorting.

Vesa

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2537e5a8-7cb4-4d71-af44-5c7948793641%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: ElasticsearchHadoop Hive integration issue

2014-01-07 Thread Costin Leau

Hi,

The 'es.resource' you specified is incorrect - you need to specify both an
index and a type - e.g. myIndex/products

P.S. Are you using M1 or the current master - the latter should give a proper
error (and message).

Thanks,

On 07/01/2014 9:48 AM, Badal Mohapatra wrote:

Hi,

I am trying to index data from hive table to elasticsearch and and using
the latest elasticsearch-hadoop-master plugin.
My elasticsearch version is 0.90.9 and hive version is hive-0.11.0.

As per the documentation of elasticsearch-hadoop plugin (hive integration), I
successfully created an external table
with the below command

/CREATE EXTERNAL TABLE es_products (
sku int,rating float,
name string,
type string,
saleprice float,
department string,
manufacturer string,
userid string,
category_name string,
query string)
STORED BY 'org.elasticsearch.hadoop.hive.ESStorageHandler'
TBLPROPERTIES('es.resource' ='products');/

Even though the external table is created
I am not able to either insert data or even query the external table.
When I do a /select * from es_products;/
I get the below exception.

hive select * from es_products;
OK
Failed with exception
java.io.IOException:java.lang.StringIndexOutOfBoundsException: String index out
of range: -1
Time taken: 1.699 seconds

Can someone please suggest what / where I am wrong!

Kind Regards,
Badal

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/dd63310c-dc07-4dc6-9354-69051a05da3f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/52CBDC15.6040307%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Hipchat Elasticsearch

2014-01-07 Thread Ivan Brusic

Here are some related links, including a video of a talk:
http://www.meetup.com/Elasticsearch-San-Francisco/events/141698772/

-- 
Ivan


On Tue, Jan 7, 2014 at 1:43 AM, Ümit Seren uemit.se...@gmail.com wrote:

 Interesting read about elasticsearch in HipChat


 http://highscalability.com/blog/2014/1/6/how-hipchat-stores-and-indexes-billions-of-messages-using-el.html

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/15bcb5d7-b1c6-4499-b0de-041e308f083e%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQA28GpcwC1bWJE%3DOGDyZiQAnsBUKea6DoVs2zvxRjY3pg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Upgrades causing Elastic Search downtime

2014-01-07 Thread Jenny Sivapalan

Hello,

We've upgraded Elastic Search twice over the last month and have
experienced downtime (roughly 8 minutes) during the roll out. I'm not sure
if it something we are doing wrong or not.

We use EC2 instances for our Elastic Search cluster and cloud formation to
manage our stack. When we deploy a new version or change to Elastic Search
we upload the new artefact, double the number of EC2 instances and wait for
the new instances to join the cluster.

For example 6 nodes form a cluster on v 0.90.7. We upload the 0.90.9
version via our deployment process and double the number nodes for the
cluster (12). The 6 new nodes will join the cluster with the 0.90.9
version.

We then want to remove each of the 0.90.7 nodes. We do this by shutting
down the node (using the plugin head), wait for the cluster to rebalance
the shards and then terminate the EC2 instances. Then repeat with the next
node. We leave the master node until last so that it does the re-election
just once.

The issue we have found in the last two upgrades is that while the
penultimate node is shutting down the master starts throwing errors and the
cluster goes red. To fix this we've stopped the Elastic Search process on
master and have had to restart each of the other nodes (though perhaps they
would have rebalanced themselves in a longer time period?). We find that we
send an increase error response to our clients during this time.

We've set out queue size for search to 300 and we start to see the queue
gets full:
at java.lang.Thread.run(Thread.java:724)
2014-01-07 15:58:55,508 DEBUG action.search.type[Matt Murdock]
[92036651] Failed to execute fetch phase
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException:
rejected execution (queue capacity 300) on
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction$2@23f1bc3
at
org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:61)
at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)

But also we see the following error which we've been unable to find the
diagnosis for:
2014-01-07 15:58:55,530 DEBUG index.shard.service [Matt Murdock]
[index-name][4] Can not build 'doc stats' from engine shard state
[RECOVERING]
org.elasticsearch.index.shard.IllegalIndexShardStateException:
[index-name][4] CurrentState[RECOVERING] operations only allowed when
started/relocated
at
org.elasticsearch.index.shard.service.InternalIndexShard.readAllowed(InternalIndexShard.java:765)

Are we doing anything wrong or has anyone experienced this?

Thanks,
Jenny

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b2328296-e9c9-4763-b61b-6ad2e145e59b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: score based on term frequency only

2014-01-07 Thread Ivan Brusic

Great feature. However, it looks like it is only available in the master
branch: https://github.com/elasticsearch/elasticsearch/issues/3772

--
Ivan

On Tue, Jan 7, 2014 at 8:31 AM, Britta Weber britta.we...@elasticsearch.com
wrote:

You could also use a script as described here:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-advanced-scripting.html

Cheers,
Britta

On Mon, Jan 6, 2014 at 2:13 AM, Ivan Brusic i...@brusic.com wrote:
You could provide your own Similarity class as a plugin. Don't have any
sample code in front of me, but it would be based of TFIDFSimilarity and
you would basically needed to ignore the norms and other values.

http://lucene.apache.org/core/4_6_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

The IDF portion could probably remain since it ranks the different terms
in
your query, not the score of each term.

Cheers,

Ivan

On Sun, Jan 5, 2014 at 1:57 PM, Kevin S kevinste...@gmail.com wrote:

I would like to score based entirely on term count.

For example, given the following two documents:

1) { apple }

2) { apple apple }

Searching apple ranks the first before the second. I wish to rank the
second, in which the term occurs twice, with a higher score.

Can someone please point me in the right direction for this?

Thank you.

https://groups.google.com/d/msgid/elasticsearch/1bb386ae-3ab5-4878-9d29-6462eaff14c7%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBwEy7UgdqYQmX3EuO71TwSAMCnDp7hdSkcvxLwH5jMJw%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALhJbBiFtgJOfhBqXkS-%2B2YWnDy81j7c5jaSFEkG%3DVizqTpykg%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDAzNoZwdcquTqyB70Kpw4DSPSPZr2fe%3DCUbMORv1pbUQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Scoring and Relative-ness based on Business Rules

2014-01-07 Thread David Mitchell

What is the best way to make products more relevant outside of the default 
scoring?

I have an unknown number of business rules that will dictate a document's 
relativity. Meaning, if one document scores higher than the other, it's 
possible that the other document will be more relevant to the user. 

Given two products with similar titles but different attributes and the 
query ipad, I'd like to promote one over the other:

{
   title_simple: iPad Mini Case,
   description_simple: Royce Leather iPad Mini Case:...,
   category: Computers  Accessories,
   brand : Royce Leather,
   id: 794809052574
}

{
  title_simple: Apple iPad mini (16GB, Wi-Fi + Sprint 4G, White),
  description_simple: iPad mini features a beautiful 7.9\ display...,
  category: Electronics,
  brand : Apple,
  id: 885909689712
}


A simple query scores the iPad case high:

{
   query: { term: { title_simple: ipad }}
}


But business rules dictate that the actual iPad be on the top. 

I can run a filter or score based on the attribute or brand to get what I'm 
looking for:

{
   query: {
  function_score: {
  query: { term: { title_simple: ipad } },
  functions : [{
  filter : { term: { category_simple: electronics } 
},
  boost_factor : 2
  }]  
  }
   }
}

But building a bunch of these isn't scalable or reasonable. 

I have an unknown number of these and that number will continue to grow. 
Some other examples:

- query xbox should promote consoles over games
- query macbook should promote Apple computers over macbook sleeves
- query Apple should promote Apple products and not food

Building a thousand queries based on functions filters is unreasonable and 
unscalable. 

Some possible solutions I've considered:

- building a lookup table that will build the filter portion of the query 
(this could get unmaintainable)
- Including a pre-calculated score in the document (unfortunately, doesn't 
work on a per query basis, as the score may change based on the user's 
needs)
- Extending the DefaultSimilary class (I'm not sure how this helps me in 
this scenario, though)

What have other people done to solve these problems? Is there something 
else that I'm missing that could help?

Here's a runnable gist - 
https://gist.github.com/dlmitchell/826e8fb7ca89bed30e4a/raw/613be2c202b26f5899bdcfeac714737beb49/sample_mapping.sh



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/70849d62-822a-4bb6-99f4-d9400d091fa9%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Too many open files

2014-01-07 Thread Adolfo Rodriguez

Hi, my model is quite slow with just about some thousands documents

I realised that, when opening a

node = 
builder.client(clientOnly).data(!clientOnly).local(local).node();
client = node.client();

from my Java program to ES with such a small model, ES automatically 
creates 10 sockets. Casually I have 10 shards (?).

* Is this the expected behavior?
* Can I reduce the number of ES shards dynamically to reduce the number of 
sockets or should I redeploy my ES install?
* By opening other connections I finally get up to 200 simultaneous open 
sockets and, I am afraid, that, when fetching highlight information, some 
of the results are randomly being lost. Can this missing results be somehow 
as a consequence of a too large number of open sockets?

Thanks for your pointers.

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4c0a4660-ef70-491d-998f-5ed73c4a9025%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Beta2 Java Client: java.nio.channels.UnresolvedAddressException

2014-01-07 Thread davrob2

Hi,

I'm having difficulty connecting with the Java client to 1.0.0.Beta2, the 
cluster is up and health, monitoring is fine using elasticsearch Head, 
elasticsearch HQ etc.

This is the stack trace I am getting:

https://gist.github.com/dav-rob/8304130 

thanks,

David.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6df4a88e-82da-4ef7-ac33-f514e4e50711%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

incrementally scaling ES from the small data

2014-01-07 Thread Adolfo Rodriguez

Hi, I plan to start with a small project, initially, with small data (few 
thousands records) to learn ES response, and, incrementally, increase data 
and resources on demand, to the big data, taking advantage of ES 
scalability.

Is there a document describing such a strategy, i.e.:

* how to properly configure an small basic deployment with good performance 
on low resources? (shards, nodes, clusters...)

* then, how to keep detecting the necessity of incrementally adding 
resources, shard/nodes..., according to increases on data load?


All docs that I find on scaling ES starts on deployments with m/billions of 
records.

Alternatively, any advice on properly configuring ES for the small data? 
(as a starting point?)

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/79926cfe-4365-4a34-895b-70835ae895dc%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Too many open files

2014-01-07 Thread Adolfo Rodriguez

I guess, my problem with excessive number of sockets could be also a 
consequence of having 2 JVM running ES, one embedded in Tomcat, a second 
embedded in other Java app, as said here:

https://groups.google.com/forum/?hl=en-GB#!topicsearchin/elasticsearch/scale%7Csort:date%7Cspell:true/elasticsearch/m9IWpGzoLLE

Is there any experience running an unique embedded ES (as jar files), for 
example, in tomcat's lib folder, being consumed by several tomcat apps and 
other standalone apps in different JVMs?

Any opinion on this configuration as an starting point?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3ba7b377-9b66-4d8b-ad65-de362318f9f2%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Scoring and Relative-ness based on Business Rules

2014-01-07 Thread Justin Treher

I think you will find that for small documents, that aren't actually
documents at all, but really a mass of data points, such as a product
library, you won't even use the built in scoring at all. The built in
scoring works well for books and articles (long works of text). For a
product library, you will use an array of custom boosts through the
function score query. The key is to get all those data points in your
documents so that you can boost on matches.

For example, with xbox, you could have a keywords field that includes
xbox just for consoles. Maybe Xbox is the title of the product while games
just have Xbox listed as their console compatibility. Only matches in the
titles will score higher.

For the macbook, you could have an accessories flag where items flagged as
an accessory receive a negative boost.

For Apple food vs. Apple products, you can use sales data or user history.

The key to having relevancy that works for your organization is by
providing all the data points to elasticsearch to base its decisions. For
products, your best solution is a big old set of constant score queries
wrapped in some wild function score queries.

On Tuesday, January 7, 2014 12:36:43 PM UTC-5, David Mitchell wrote:

What is the best way to make products more relevant outside of the default
scoring?

I have an unknown number of business rules that will dictate a document's
relativity. Meaning, if one document scores higher than the other, it's
possible that the other document will be more relevant to the user.

Given two products with similar titles but different attributes and the
query ipad, I'd like to promote one over the other:

{
title_simple: iPad Mini Case,
description_simple: Royce Leather iPad Mini Case:...,
category: Computers Accessories,
brand : Royce Leather,
id: 794809052574
}

{
title_simple: Apple iPad mini (16GB, Wi-Fi + Sprint 4G, White),
description_simple: iPad mini features a beautiful 7.9\ display...,
category: Electronics,
brand : Apple,
id: 885909689712
}

A simple query scores the iPad case high:

{
query: { term: { title_simple: ipad }}
}

But business rules dictate that the actual iPad be on the top.

I can run a filter or score based on the attribute or brand to get what
I'm looking for:

{
query: {
function_score: {
query: { term: { title_simple: ipad } },
functions : [{
filter : { term: { category_simple: electronics
} },
boost_factor : 2
}]
}
}
}

But building a bunch of these isn't scalable or reasonable.

I have an unknown number of these and that number will continue to grow.
Some other examples:

- query xbox should promote consoles over games
- query macbook should promote Apple computers over macbook sleeves
- query Apple should promote Apple products and not food

Building a thousand queries based on functions filters is unreasonable and
unscalable.

Some possible solutions I've considered:

- building a lookup table that will build the filter portion of the query
(this could get unmaintainable)
- Including a pre-calculated score in the document (unfortunately, doesn't
work on a per query basis, as the score may change based on the user's
needs)
- Extending the DefaultSimilary class (I'm not sure how this helps me in
this scenario, though)

What have other people done to solve these problems? Is there something
else that I'm missing that could help?

Here's a runnable gist -
https://gist.github.com/dlmitchell/826e8fb7ca89bed30e4a/raw/613be2c202b26f5899bdcfeac714737beb49/sample_mapping.sh

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/48fb3984-a23c-4d95-aa34-e8e67dce8df9%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Transport Client hangs in my web application during search.

2014-01-07 Thread Search User

I have a web application in which I create a Transport Client using Spring 
(singleton) and inject it into my service. When I receive a request in my 
controller, controller calls the service and service uses the transport 
client to execute the query and return the results. When I deploy this 
application in tomcat, I have the client created but when I execute the 
query, client hangs. 

If I create the client for every request (in my service) and run the query, 
everything is fine. Can some one help me understand this behavior?

Following is my code to create the Client object.

Settings settings = ImmutableSettings.settingsBuilder().put(cluster.name, 
mysearchcluster).put(client.transport.sniff, true).build();
Client client = new TransportClient(settings).addTransportAddress(new 
InetSocketTransportAddress(10.150.200.101, 9300));



Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4c846ec4-15c5-4c6f-9e1c-6c56912cc2ee%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Scoring and Relative-ness based on Business Rules

2014-01-07 Thread David Mitchell

Thanks for your answer.

So, instead of relying on queries to pull out the right stuff, you're
suggesting to model the documents to the queries.

This suggests that there's a custom boost for every search term, which is
what I was hoping to avoid, if only because of the impossible task of going
through all our data and determining what to boost/not boost. This also
implies that there's another key/value store of queries-to-boost keywords,
which again could get costly to maintain.

If I'm understanding you correctly, it would look similar to what I
previously posted, but only with a larger (possibly dynamic) set of boost
queries.

Doing so is primarily a manual task - are there more automatic ways to
build up relevancy, or even tools/processes that help?

On Tuesday, January 7, 2014 11:50:40 AM UTC-8, Justin Treher wrote:

For the macbook, you could have an accessories flag where items flagged as
an accessory receive a negative boost.

For Apple food vs. Apple products, you can use sales data or user history.

On Tuesday, January 7, 2014 12:36:43 PM UTC-5, David Mitchell wrote:

What is the best way to make products more relevant outside of the
default scoring?

Given two products with similar titles but different attributes and the
query ipad, I'd like to promote one over the other:

{
title_simple: iPad Mini Case,
description_simple: Royce Leather iPad Mini Case:...,
category: Computers Accessories,
brand : Royce Leather,
id: 794809052574
}

{
title_simple: Apple iPad mini (16GB, Wi-Fi + Sprint 4G, White),
description_simple: iPad mini features a beautiful 7.9\ display...
,
category: Electronics,
brand : Apple,
id: 885909689712
}

A simple query scores the iPad case high:

{
query: { term: { title_simple: ipad }}
}

But business rules dictate that the actual iPad be on the top.

I can run a filter or score based on the attribute or brand to get what
I'm looking for:

{
query: {
function_score: {
query: { term: { title_simple: ipad } },
functions : [{
filter : { term: { category_simple: electronics
} },
boost_factor : 2
}]
}
}
}

But building a bunch of these isn't scalable or reasonable.

I have an unknown number of these and that number will continue to grow.
Some other examples:

- query xbox should promote consoles over games
- query macbook should promote Apple computers over macbook sleeves
- query Apple should promote Apple products and not food

Building a thousand queries based on functions filters is unreasonable
and unscalable.

Some possible solutions I've considered:

- building a lookup table that will build the filter portion of the query
(this could get unmaintainable)
- Including a pre-calculated score in the document (unfortunately,
doesn't work on a per query basis, as the score may change based on the
user's needs)
- Extending the DefaultSimilary class (I'm not sure how this helps me in
this scenario, though)

What have other people done to solve these problems? Is there something
else that I'm missing that could help?

Here's a runnable gist -
https://gist.github.com/dlmitchell/826e8fb7ca89bed30e4a/raw/613be2c202b26f5899bdcfeac714737beb49/sample_mapping.sh

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/79abb91e-1be3-430a-b23d-a1582fae525b%40googlegroups.com.
For more options, visit

Any fix timeline for split brain issue: 2488

2014-01-07 Thread bitsofinfo . g

Hi, is there any timeline on a fix 
for https://github.com/elasticsearch/elasticsearch/issues/2488 ?

thanks!

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/79fc8f45-08f5-4abc-9349-06b23debc3a2%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Match exact substring in not analyzed field

2014-01-07 Thread InquiringMind

This is an interesting problem. Typically, my view of stop words is dim. I
would prefer that the client side avoids searching on them if that is
desired, rather than the engine ignores them. Then, phrase matching can
work properly. And queries such as The Wall can look for just Wall(ignoring
The as a stop word), but then the Google-like +The Wall can look for The
Wall. Yeah, I know that ES is not Google; I only look to Google for ideas
that are nice and for hints about their implementation based upon their
external behavior.

Then, your problem could be solved using a phrase query with no slop.

Maybe your testMulti field is analyzed but no stop words are ignored. Or,
maybe testMulti.raw is analyzed but with no stop words ignored. Either way,
you'd have the full set of words indexed for a phrase query to quickly find
the sub-match. At least, much, much more quickly than a grep-style wildcard
search against a non-analyzed form of the field.

I also used phrases within my own table-based synonym matching. Instead of
using ES synonyms, I create a separate type with lists of synonyms. A query
for a synonym is first directed to that type to fetch a list of synonyms;
then an OR query is generated. This has proven to be fast enough. It has
the benefit of allowing the synonyms to be updated with no changes to the
97-millon documents that are already indexed. And, synonyms can be phrases,
for example: HUGE - VERY BIG. So now a synonym query for HUGE can find The
Very Big Dog. Likewise, a synonym query for the phrase VERY BIG can find The
Huge Dog. Really cool; just a matter of Java coding on the front end. And
ES does the heavy lifting underneath. But I digress a little...

Hope this helps.

Brian

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5440531a-2ccc-4df1-9edb-422012f7dd3b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: incrementally scaling ES from the small data

2014-01-07 Thread Mark Walkom

As a really, really rough guide;
Start with a small instance, 4-8G RAM (2-4G heap). Keep loading documents
until things start to slow down (ie query/update responsiveness drops). Add
a new node.
Rinse and repeat.

If you have one node there is no point using replicas as they have nowhere
to go. You can easily add replicas later though so it's no big deal.
Shards is a little harder, start with the standard/default of 8 shards and
go from there. Using aliases can allow you to reindex your data later if
you feel you may want to change this.

You can monitor your cluster with a range of monitoring plugins -
elasticHQ, kopf, elasticsearch-monitoring, bigdesk. Just search for them on
github.

As Boaz mentioned, it really does depend on what you are doing. Chances are
you will go through all this and get to a point where you want to rebuild
your cluster with all your gained knowledge!

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 8 January 2014 09:18, Boaz Leskes b.les...@gmail.com wrote:

Hi Adolfo,

The best way to scale depends on your data and how it behaves. You can
watch this great talk by Shay about two use cases to get inspired:
http://www.elasticsearch.org/videos/big-data-search-and-analytics/

Cheers,
Boaz

On Tuesday, January 7, 2014 8:13:18 PM UTC+1, Adolfo Rodriguez wrote:

Hi, I plan to start with a small project, initially, with small data (few
thousands records) to learn ES response, and, incrementally, increase data
and resources on demand, to the big data, taking advantage of ES
scalability.

Is there a document describing such a strategy, i.e.:

* how to properly configure an small basic deployment with good
performance on low resources? (shards, nodes, clusters...)

* then, how to keep detecting the necessity of incrementally adding
resources, shard/nodes..., according to increases on data load?

All docs that I find on scaling ES starts on deployments with m/billions
of records.

Alternatively, any advice on properly configuring ES for the small
data? (as a starting point?)

Thanks

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3d444d6f-fa0d-4567-a46b-538ea9b379f9%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZRacXqWCg56kFvjYsf1_cDxLT4Drhdbk6jFL5_Q1EekA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: incrementally scaling ES from the small data

2014-01-07 Thread Adolfo Rodriguez

Thanks both for your comments.

Shards is a little harder, start with the* standard/default of 8 shards*and go
from there.

* This is the point that is confusing me the most. For a very small initial
deployment, with a few thousand docs, why not using just define 1 shard
with no replica? What criteria you used to set 8 shards as a default (BTW,
defaults - in ES 0.90.5 - are 5 Successful Shards, 5 Unassigned Shards, is
not it?).

* Suppose that you start with the smaller minimum setup: 1 cluster, 1 node,
1 shard, no replica, Will I be able to incrementally scale any of these
settings up? And will I able also to scale any of these settings down
after? (or will need to repopulate ES in any particular case). The idea is
testing different configs.

* In my current particular case, can I scale down my current 5 shards/1
replica (default 0.90.5 AFAIK) to 1 shard/no replica? And start from there?

The reason I am concerned about this is that I see lot of sockets (maybe
200 hundreds on my system - 2 ES on different apps in same machine - and
want to understand where they come from and how to allocate the optimum). I
watched Shai's presentation yesterday but could no grasp this info.

Thanks

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4e8b513f-42a0-45e7-b677-842876c2570b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: incrementally scaling ES from the small data

2014-01-07 Thread Ivan Brusic

Elasticsearch uses consistent hashing, so you cannot change the number of
shards for an index.

If you can reindex data, then you can create a new index with a different
number of shards and simply reindex. If your data is temporal in nature,
you can create a new index per day/week/month and these new indices can
have a different shard value. You can search against multiple indices even
if they have different shard values.

IMHO, shard values in the high single digits (5-10) is a great starting
point. Even with a single node cluster, the default number of shards (5)
should not cause any performance issues.

Cheers,

Ivan

On Tue, Jan 7, 2014 at 4:47 PM, Adolfo Rodriguez pellyado...@yahoo.eswrote:

Thanks both for your comments.

Shards is a little harder, start with the* standard/default of 8 shards*and
go from there.

* This is the point that is confusing me the most. For a very small
initial deployment, with a few thousand docs, why not using just define 1
shard with no replica? What criteria you used to set 8 shards as a default
(BTW, defaults - in ES 0.90.5 - are 5 Successful Shards, 5 Unassigned
Shards, is not it?).

* Suppose that you start with the smaller minimum setup: 1 cluster, 1
node, 1 shard, no replica, Will I be able to incrementally scale any of
these settings up? And will I able also to scale any of these settings down
after? (or will need to repopulate ES in any particular case). The idea is
testing different configs.

* In my current particular case, can I scale down my current 5 shards/1
replica (default 0.90.5 AFAIK) to 1 shard/no replica? And start from there?

Thanks

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4e8b513f-42a0-45e7-b677-842876c2570b%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDzAdvA1mNk%2BBUb-4N5mPayP9MCBXm%2BONsptYhnBOhFgA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Design practices for hosting multiple clusters/on-demand cluster creation?

2014-01-07 Thread Josh Harrison

While ES is still in a pre deployment stage at my job, there is growing 
interest in it. For various reasons, a monster cluster holding everyone's 
stuff is simply not possible. Individual projects require complete control 
over their data and the culture and security requirements here are such 
that doing something like always naming project 1's indexes 
PROJECT_1_something will not fly.
We have a fairly beefy hadoop cluster hosting our content currently, along 
with a separate head node acting as the master.
In this situation, is it simply a matter of starting up new processes on 
each node pointed at different configuration profiles and tying specific 
ports to specific projects/clusters?

Basically, is there an established way to build on-demand clusters, given a 
set of resources? We'll layer something in front of it to deal with access 
control/etc.

Thanks!
-Josh

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ad2695f7-d1a2-4036-82b2-58bddf349681%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: incrementally scaling ES from the small data

2014-01-07 Thread Adolfo Rodriguez

Thanks Ivan,
 

 Elasticsearch uses consistent hashing, so you cannot change the number of 
 shards for an index.


So, I understand that, once the index is created, is only possible to 
scale, up and down, nodes, clusters and replicas. But no shards. 
Interesting.
 

 IMHO, shard values in the high single digits (5-10) is a *great starting 
 point.* Even with a single node cluster, the default number of shards (5) 
 should not cause any performance issues.


I am worried about the 200 hundred established sockets in my machine 
(running 2 ES) since I suspect they are producing me some random data lose 
on getting highlighting information. And I was wondering if setting just 1 
shard/0 replica on each ES would get rid of these unwanted sockets (?). Why 
is advised to start with (5-10) rather than with (1-0) * 2 ES ? Any reason?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/42c801d9-83ac-4096-b148-f973dadaeb1e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: incrementally scaling ES from the small data

2014-01-07 Thread Ivan Brusic

An increase of shards will not cause an increase in sockets used. Each node
shard action is responsible for gather the responses from each shard at the
file-level before sending the response back to the client.

Since each shard is actually its own Lucene index, an increase of shards
will increase metrics at the IO level, especially the number of open file
descriptors.

It is advised to start of with 5 because that would allow you to scale an
index horizontally without needing to reindex. You can increase your
cluster from 1 to 5 and each node will have a piece of the index instead of
the entire index that. Beyond that number, you can distribute the index
with more replicas. More shards increase availability IMHO. Ultimately you
do not want large shards for performance reasons.

--
Ivan

On Tue, Jan 7, 2014 at 5:23 PM, Adolfo Rodriguez pellyado...@yahoo.eswrote:

I am worried about the 200 hundred established sockets in my machine
(running 2 ES) since I suspect they are producing me some random data lose
on getting highlighting information. And I was wondering if setting just 1
shard/0 replica on each ES would get rid of these unwanted sockets (?). Why
is advised to start with (5-10) rather than with (1-0) * 2 ES ? Any reason?

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDcRQsnr_WONKAcu8QWiroHabhfD9spLKk2qcqatTfgrQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

How to index an existing json file

2014-01-07 Thread ZenMaster80

Hi,

I am just starting with ElasticSearch, I would like to know how to index a 
simple json document books.json that has the following in it: Where do I 
place the document? I placed it in root directory of elastic search and in 
/bin folder..

{“books”:[{“name”:”life in heaven”,”author”:”Mike Smith”},{“name”:”get 
rich”,”author”:”Joe Shmoe”},{“name”:”luxury properties”,”author”:”Linda 
Jones”]}}


$ curl -XPUT http://localhost:9200/books/book/1; -d @books.json

Warning: Couldn't read data from file books.json, this makes an empty 
POST.

{error:MapperParsingException[failed to parse, document is 
empty],status:400}


Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a5c1e37f-9472-499c-9499-1475c944f47b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: How to index an existing json file

2014-01-07 Thread Ivan Brusic

The JSON file is used by the curl command, so in your example it should be
in the same directory in which you executed the command (current directory).

--
Ivan

On Tue, Jan 7, 2014 at 6:00 PM, ZenMaster80 sabdall...@gmail.com wrote:

Hi,

I am just starting with ElasticSearch, I would like to know how to index a
simple json document books.json that has the following in it: Where do I
place the document? I placed it in root directory of elastic search and in
/bin folder..

{“books”:[{“name”:”life in heaven”,”author”:”Mike Smith”},{“name”:”get
rich”,”author”:”Joe Shmoe”},{“name”:”luxury properties”,”author”:”Linda
Jones”]}}

$ curl -XPUT http://localhost:9200/books/book/1; -d @books.json

Warning: Couldn't read data from file books.json, this makes an empty
POST.

{error:MapperParsingException[failed to parse, document is
empty],status:400}

Thanks

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a5c1e37f-9472-499c-9499-1475c944f47b%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDg%3Du3HfBvKnQrCy6XEJ6knyrvx042j8kn7YZmMz96FhA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: How to index an existing json file

2014-01-07 Thread ZenMaster80

Great, Do you know why I am getting  

{error:MapperParsingException[failed to parse]; nested: 
JsonParseException[Unrecognized token 'life': was expecting ('true', 
'false' or 'null')\n at [Source: [B@5c9a9d06; line: 1, column: 35]]; 
,status:400}

data:

{“books”:[{“name”:”life in heaven”,”author”:”Mike Smith”},{“name”:”get 
rich”,”author”:”Joe Shmoe”},{“name”:”luxury properties”,”author”:”Linda 
Jones”}]}



On Tuesday, January 7, 2014 9:06:01 PM UTC-5, Ivan Brusic wrote:

 The JSON file is used by the curl command, so in your example it should be 
 in the same directory in which you executed the command (current directory).

 -- 
 Ivan


 On Tue, Jan 7, 2014 at 6:00 PM, ZenMaster80 sabda...@gmail.comjavascript:
  wrote:

 Hi,

 I am just starting with ElasticSearch, I would like to know how to index 
 a simple json document books.json that has the following in it: Where do 
 I place the document? I placed it in root directory of elastic search and 
 in /bin folder..

 {“books”:[{“name”:”life in heaven”,”author”:”Mike Smith”},{“name”:”get 
 rich”,”author”:”Joe Shmoe”},{“name”:”luxury properties”,”author”:”Linda 
 Jones”]}}


 $ curl -XPUT http://localhost:9200/books/book/1; -d @books.json

 Warning: Couldn't read data from file books.json, this makes an empty 
 POST.

 {error:MapperParsingException[failed to parse, document is 
 empty],status:400}


 Thanks

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/a5c1e37f-9472-499c-9499-1475c944f47b%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5d15fcdf-4a0f-4d92-9dd3-f07899d915fe%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

How many metadata fields exist of MP3 file ?

2014-01-07 Thread HongXuan Ji

Hi all,

I am wondering how many metadata fields of MP3 files exist when I post the 
mp3 file into ElasticSearch using the mapper-attachment. 

Because in Solr we can know the field information through the endpoint 
SOLR_HOST/update/extract?extractOnly=true, 

but in ElasticSearch are there any ways to get such informations?  Except 
for the MP3 files, how about the doc files? 

I know the ElasticSearch use tika to support this operations, can you give 
me some example to fetch some special field of some special file format?

Regards,

Ivan 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/742f86b9-9dd8-4354-ae50-26332f0c4dc0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Too many open files

2014-01-07 Thread Adolfo Rodriguez

Happily, the problem of missing highlight records looks to be gone by 
making a config change.

* Initially I had 2 ES in 2 different apps (a Tomcat and a standalone) 
configured equal (both listening for incoming TransportClients requests on 
port 9300 and both open with client(false)) and a third ES connecting to 
then opened with new TransportClient() to fetch highlighting info. It looks 
that this third ES was randomly loosing highlighting records. (?)

* What I did to fix it was a configuration change to have only one 
client(false)) ES listening for TransportClients and 2 new 
TransportClient()s connecting to it.

It looks this change fixes the issue which was some kind of coupling 
between both client(false)) ESs listening on port 9300.

Regards

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fbe72b9f-eeac-4d2b-9545-6851352aa3d5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: incrementally scaling ES from the small data

2014-01-07 Thread Adolfo Rodriguez

Thanks Ivan, makes sense. Still could not test how sockets relate to shards 
and why I automatically get 10 established sockets when opening a client:

node = builder.client(clientOnly).data(!clientOnly).local(local).node();

client = node.client();


on default ES configuration, and many many more sockets after (up to 200), 
and how this number changes when increasing/decreasing number of shards, 

but happily I managed to fix the initial issue of highlighting info being 
randomly lost by a config change as described here:

https://groups.google.com/d/msg/elasticsearch/3t6UL_vzM7o/TLnV2m2B1NAJ


so sockets does not look an issue anymore. 

Regards.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8a65007d-1053-4842-9c6b-93564b3ec44f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Transport Client hangs in my web application during search.

2014-01-07 Thread Jason Wee

Does it show anything in the log? Perhaps try catch block on your code and
set a query timeout.

HTH

/Jason

On Wed, Jan 8, 2014 at 4:41 AM, Search User feedwo...@gmail.com wrote:

I have a web application in which I create a Transport Client using Spring
(singleton) and inject it into my service. When I receive a request in my
controller, controller calls the service and service uses the transport
client to execute the query and return the results. When I deploy this
application in tomcat, I have the client created but when I execute the
query, client hangs.

If I create the client for every request (in my service) and run the
query, everything is fine. Can some one help me understand this behavior?

Following is my code to create the Client object.

Settings settings = ImmutableSettings.settingsBuilder().put(cluster.name
, mysearchcluster).put(client.transport.sniff, true).build();
Client client = new TransportClient(settings).addTransportAddress(new
InetSocketTransportAddress(10.150.200.101, 9300));

Thanks

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4c846ec4-15c5-4c6f-9e1c-6c56912cc2ee%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHO4itxM795Xuo8tikF4oADgYH50R58Y8B0qwdMz4nU82koN3w%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: How to index an existing json file

2014-01-07 Thread David Pilato

Start with a clean index:

curl -XDELETE http://localhost:9200/books/;

You probably have a bad mapping (some docs already indexed?)

If you still have problems, please gist a full curl recreation. See
http://www.elasticsearch.org/help/

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 8 janv. 2014 à 03:10, ZenMaster80 sabdall...@gmail.com a écrit :

Great, Do you know why I am getting
{error:MapperParsingException[failed to parse]; nested:
JsonParseException[Unrecognized token 'life': was expecting ('true', 'false' or
'null')\n at [Source: [B@5c9a9d06; line: 1, column: 35]]; ,status:400}

data:

{“books”:[{“name”:”life in heaven”,”author”:”Mike Smith”},{“name”:”get
rich”,”author”:”Joe Shmoe”},{“name”:”luxury properties”,”author”:”Linda
Jones”}]}

On Tuesday, January 7, 2014 9:06:01 PM UTC-5, Ivan Brusic wrote:
The JSON file is used by the curl command, so in your example it should be in
the same directory in which you executed the command (current directory).

--
Ivan

On Tue, Jan 7, 2014 at 6:00 PM, ZenMaster80 sabda...@gmail.com wrote:
Hi,

$ curl -XPUT http://localhost:9200/books/book/1; -d @books.json

Warning: Couldn't read data from file books.json, this makes an empty POST.
{error:MapperParsingException[failed to parse, document is
empty],status:400}

Thanks

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a5c1e37f-9472-499c-9499-1475c944f47b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5d15fcdf-4a0f-4d92-9dd3-f07899d915fe%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/E9FE0784-B10E-48AD-9C46-45B44B1513B9%40pilato.fr.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Transport Client hangs in my web application during search.

2014-01-07 Thread David Pilato

Your code looks good to me.
Don't create multiple client but only one for your whole application.

As Jason wrote, look at logs.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 8 janv. 2014 à 07:40, Jason Wee peich...@gmail.com a écrit :

Does it show anything in the log? Perhaps try catch block on your code and set
a query timeout.

HTH

/Jason

On Wed, Jan 8, 2014 at 4:41 AM, Search User feedwo...@gmail.com wrote:
I have a web application in which I create a Transport Client using Spring
(singleton) and inject it into my service. When I receive a request in my
controller, controller calls the service and service uses the transport
client to execute the query and return the results. When I deploy this
application in tomcat, I have the client created but when I execute the
query, client hangs.

If I create the client for every request (in my service) and run the query,
everything is fine. Can some one help me understand this behavior?

Following is my code to create the Client object.

Settings settings = ImmutableSettings.settingsBuilder().put(cluster.name,
mysearchcluster).put(client.transport.sniff, true).build();
Client client = new TransportClient(settings).addTransportAddress(new
InetSocketTransportAddress(10.150.200.101, 9300));

Thanks
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4c846ec4-15c5-4c6f-9e1c-6c56912cc2ee%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHO4itxM795Xuo8tikF4oADgYH50R58Y8B0qwdMz4nU82koN3w%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/FD569ACE-1811-4FEC-AFDC-7DA96A621B61%40pilato.fr.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Design practices for hosting multiple clusters/on-demand cluster creation?

2014-01-07 Thread David Pilato

You could look at chef cookbook:
https://github.com/elasticsearch/cookbook-elasticsearch
http://www.elasticsearch.org/tutorials/deploying-elasticsearch-with-chef-solo/

Does it help?

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 8 janv. 2014 à 02:01, Josh Harrison hij...@gmail.com a écrit :

While ES is still in a pre deployment stage at my job, there is growing
interest in it. For various reasons, a monster cluster holding everyone's stuff
is simply not possible. Individual projects require complete control over their
data and the culture and security requirements here are such that doing
something like always naming project 1's indexes PROJECT_1_something will not
fly.
We have a fairly beefy hadoop cluster hosting our content currently, along with
a separate head node acting as the master.
In this situation, is it simply a matter of starting up new processes on each
node pointed at different configuration profiles and tying specific ports to
specific projects/clusters?

Basically, is there an established way to build on-demand clusters, given a set
of resources? We'll layer something in front of it to deal with access
control/etc.

Thanks!
-Josh
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ad2695f7-d1a2-4036-82b2-58bddf349681%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/535D6769-0469-4BF8-9840-C67FA81CFD89%40pilato.fr.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Exception cause unwrapping ran for 10 levels

Sorting by deeply nested filters

Re: Query: Parents with at least x children of type y

Order results by value in one of the array entries.

Re: Sorting by deeply nested filters

Re: ElasticsearchHadoop Hive integration issue

Re: Hipchat Elasticsearch

Upgrades causing Elastic Search downtime

Re: score based on term frequency only

Scoring and Relative-ness based on Business Rules

Re: Too many open files

Beta2 Java Client: java.nio.channels.UnresolvedAddressException

incrementally scaling ES from the small data

Re: Too many open files

Re: Scoring and Relative-ness based on Business Rules

Transport Client hangs in my web application during search.

Re: Scoring and Relative-ness based on Business Rules

Any fix timeline for split brain issue: 2488

Re: Match exact substring in not analyzed field

Re: incrementally scaling ES from the small data

Re: incrementally scaling ES from the small data

Re: incrementally scaling ES from the small data

Design practices for hosting multiple clusters/on-demand cluster creation?

Re: incrementally scaling ES from the small data

Re: incrementally scaling ES from the small data

How to index an existing json file

Re: How to index an existing json file

Re: How to index an existing json file

How many metadata fields exist of MP3 file ?

Re: Too many open files

Re: incrementally scaling ES from the small data

Re: Transport Client hangs in my web application during search.

Re: How to index an existing json file

Re: Transport Client hangs in my web application during search.

Re: Design practices for hosting multiple clusters/on-demand cluster creation?

35 matches

Site Navigation

Mail list logo

Footer information