Re: How to get more than 10 terms bucket with Java API?

2014-07-31 Thread Alain Désilets
Thx. That did the trick. I now realize that I was setting the size on the 
return value of addAggregation(), when in fact, I needed to set it on the 
return value of the field() method.

On Tuesday, 29 July 2014 15:56:29 UTC-4, Adrien Grand wrote:
>
> Hi,
>
> You need to set the size on the terms aggregation:
>
> AggregationBuilders.terms("category_names").field("category").size(20)
>
>
> On Mon, Jul 28, 2014 at 9:22 PM, Alain Désilets  > wrote:
>
>> I have an ES where I have indexed a bunch of files. Each file is tagged 
>> with a category field. I want to write Java code to get a list of all the 
>> categories. I am trying to do this using the terms aggregation:
>>
>> QueryBuilder qb = QueryBuilders.matchAllQuery(); 
>> SearchResponse sr = 
>> esClient.prepareSearch()
>> .setQuery(qb)
>> .setSize(20)
>>  
>> .addAggregation(AggregationBuilders.terms("category_names").field("category")).setSize(20)
>> .execute()
>>  .actionGet(); 
>>
>> Terms terms = sr.getAggregations().get("category_names");
>> int numCategories = terms.getBuckets().size();
>> System.out.println("Number of category buckets found: "+ numCategories);
>>
>> But this code always prints out that it found 10 category buckets. Yet, 
>> if I execute the following query in Sense:
>>
>> GET files-index/file_with_category/_search
>> {
>>   "query": {
>>   "match_all": {}
>>   },
>>   "aggs": {
>> "categories": {
>>   "terms": {
>> "field": "category",
>> "size": 100
>>   }
>> }
>>   }
>> }
>>
>> I get 18 category buckets. What am I doing wrong in the Java code?
>>
>> Thx.
>>
>> Alain
>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/b6d0300c-524e-423b-bd5a-6ce9d9e7b168%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/b6d0300c-524e-423b-bd5a-6ce9d9e7b168%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Adrien Grand
>  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f4ff78a9-f042-43c6-ad61-d11ba5fb7ff1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How to get more than 10 terms bucket with Java API?

2014-07-28 Thread Alain Désilets
I have an ES where I have indexed a bunch of files. Each file is tagged 
with a category field. I want to write Java code to get a list of all the 
categories. I am trying to do this using the terms aggregation:

QueryBuilder qb = QueryBuilders.matchAllQuery(); 
SearchResponse sr = 
esClient.prepareSearch()
.setQuery(qb)
.setSize(20)
.addAggregation(AggregationBuilders.terms("category_names").field("category")).setSize(20)
.execute()
.actionGet(); 

Terms terms = sr.getAggregations().get("category_names");
int numCategories = terms.getBuckets().size();
System.out.println("Number of category buckets found: "+ numCategories);

But this code always prints out that it found 10 category buckets. Yet, if 
I execute the following query in Sense:

GET files-index/file_with_category/_search
{
  "query": {
  "match_all": {}
  },
  "aggs": {
"categories": {
  "terms": {
"field": "category",
"size": 100
  }
}
  }
}

I get 18 category buckets. What am I doing wrong in the Java code?

Thx.

Alain


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b6d0300c-524e-423b-bd5a-6ce9d9e7b168%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Retrieving document groups with MoreLikeThis query

2014-07-24 Thread Alain Désilets
Say I have a set of documents D={d1,..., dn} indexed in ES.

I also have a set of document groups G={g1,...,gk}, where the groups may 
not be disjoint.

Given a group g in G, called the Query Group, I want to find "similar" 
groups g' in G, called the Similar Groups. where "similar" means that the 
documents in g tend to use the same words as documents in each of the g'.

What would be the easiest way do this with ES?

Note that I already know how to find individual documents d which are 
similar to a group g. You just do a MoreLikeThis search, feeding it a 
multi-get query with the IDs of the various documents in g. But this will 
retrieve individual documents, when in fact what I want document groups.

I can think of several ways of doing this (see below). Not sure what the 
pros and cons of each approach are, in terms of speed and "accuracy". Does 
anyone have advice on this?

Thx (PS Sorry for the long post, but it's a pretty complex issue)

Alain

===
Approach 1: Groups as pseudo-documents that concatenate content of their 
members

Create a type 'documents-group'. These are "pseudo-documents" whose content 
will be the concatenation of the content of all documents in a given group.

You then do a MoreLikeThis (MLT) query, feeding it the pseudo-document for 
the Query Group, and asking specifically for documents of type 
documents-group

PROS: 
- Can be supported for sure.

CONS:
- Need to write code in the application to create and manage those 
pseudo-documents.
- The content of each document will be duplicated N times, where N is the 
number of groups that it belongs to.
-- This means the index will be much larger (in my application, a document 
can belong to hundreds of groups)
- Requires a lot of reindexing
-- Everytime you change a document, you need to re-index all the groups it 
belongs to.
-- If you add/remove a document to a group, you need to re-index that 
group, which means reindex the equivalent of all the documents it contains 
(in my application, a group may contain 100 documents or more)

===
Approach 2: Groups as pseudo-documents containing only the most important 
terms 

This is similar to approach 1, except that the pseudo-document contains a 
list of the most significant terms of documents in that group (obtained 
with the Significant Terms aggregation), as opposed to containing a 
concatenation of all the member documents.

You then do a MoreLikeThis (MLT) query, feeding it the documents-group 
document for the Query Group, and asking specifically for documents of type 
documents-group

PROS: 
- Can be supported for sure.
- The content of individual documents is not completely duplicated. At 
words, a handful of terms from each document may be duplicated.

CONS:
- Need to write code in the application to create and manage those 
document-group pseudo-documents.
- Requires a lot of reindexing
-- Everytime you change a document, you need to re-generate the the list of 
significant terms for all the groups it belongs to.
-- If you add/remove a document to a group, you re-generate the the list of 
significant terms for that group.
- May be less accurate, as similarity is based on only the most significant 
terms, instead of all available terms. In other words, it could be that 
terms which seem a-priori unimportant, turn out to be very important when 
trying to determine similarity to the Query Group.

===
Approach 3: Multi-get MLT, with automatic aggregation of average score on 
the groups

Each document includes a field that specifies the groups it belongs to.

You then do a MLT search, feeding it a Multi-Get query of the IDs of all 
the documents in the Query Group. The query also asks ES to do a metric 
aggregate using the Group membership field as the basis for the 
aggregation, and the average MLT score as the metric to be computed.

The group that has the highest MLT average score is considered to be the 
most relevant group.

PROS:
- No need to write application code to create and manage pseudo-documents 
for the groups.
- Does not requires too much reindexing
-- If you modify a document, you only need to reindex that document.
-- If you add/remove a document in a group, you only need to reindex that 
one document (because the list of groups that the document belongs to has 
changed).

CONS:
- Not clear that it can be done. Possible problems are:
-- The groups are overlapping. In other words, a given document may appear 
in more than one parts of the aggregation. Not sure if ES aggregation deals 
properly with that.
-- Not sure that you can use the MLT score as the metric to be averaged. 
The reason being that the score is a dynamic property of each document, 
whose value is determined specifically by MLT at query time. Metric 
aggregation is typically used with static attributes (ex: a document's 
author).

=== 
Approach 4: Multi-get MLT, with "manual" aggregation of average score on 
the groups

This is very similar to Approach 2, except that the aggregation of MLT 
scores

Re: Need help with Java API

2014-07-22 Thread Alain Désilets


On Tuesday, 22 July 2014 14:45:28 UTC-4, Jörg Prante wrote:
>
> Noo bulk request API is fantistic and a must!
>
> Just take a bit care to not pass nulls or empty requests to it.
>

Thx for the encouragement. I perservered a bit more and then realized that 
I had misunderstood how to create an IndexRequest. I fixed the problem and 
it's now working.

I have updated the gist demo code accordingly.

Alain

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f40afe21-42a9-40af-a6a0-0779eb6a9d41%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Need help with Java API

2014-07-22 Thread Alain Désilets


On Tuesday, 22 July 2014 14:21:29 UTC-4, Jörg Prante wrote:
>
> To Issue 5: unfortunately, there are some hidden NPEs in the bulk request 
> API.
>

Ah, OK. 

I guess I'll stay away from the bulk request API then.

Alain

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f6a4d5df-52fd-4ad7-899f-964d7b7d451a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Need help with Java API

2014-07-22 Thread Alain Désilets
Thx Jörg,

Your comments helped me get quite a bit further. Here is an updated version 
of my code:

   https://gist.github.com/alaindesilets/aec9492890c37075fa4e

On Monday, 21 July 2014 13:13:58 UTC-4, Jörg Prante wrote:
>
> To issue 1: you create a single node cluster without index, and a client 
> of it. 
>

Duh! Don't know how I could have missed that. So now, I added a 
method createIndex() which creates the index if it doesn't exist, and sets 
number_of_replicas to 2. It doesn't define mappings since the ES doc says 
that default mappings will be automatically generated if none are specified.

Note however that even with that change, I still couldn't see the new index 
in Marvel/Sense, when I instantiated the client with 
method makeClientFromEmbeddedNode(). But if I instantiate the client 
through a TransportClient, i.e. by invoking 
method makeClientFromTransportClient(), then Marvel/Sense sees the new 
index.

Using a TransportClient instead of a client obtained from an embedded node 
also greatly accelerated the client creation. Instead of taking > 8 secs to 
create the client, it takes < 1 sec. Not sure why.
 

>
> To issue 2: you see the UnavailableShardsException caused by a timeout 
> while indexing to a replica shard. This means, you may have set up a single 
> node cluster, but with replica level 1 (default) which needs 2 nodes for 
> indexing. Maybe there was once another node joining the cluster and ES 
> wants it back abut it never came (after 60 secs). Then ES returns the 
> timeout error. Maybe replica level 0 helps. You should also check the 
> cluster health. A color of green shows everything works, yellow means there 
> are too few nodes to satisfy the replica condition, and read means the 
> cluster is not in a consistent/usable state.
>

I tried setting number_replica=0 and number_replica=1 in createIndex(), but 
I still got the org.elasticsearch.action.UnavailableShardsException error 
if I instantiate the client with makeClientFromEmbeddedNode(). But I don't 
get the error if I instantiate the client with 
makeClientFromTransportClient(), independantly of the number of replicas I 
specify.
 

> To issue 3: not sure what clusterName() means. I would use settings and 
> add a field "cluster.name". Maybe it is equivalent. You must ensure you 
> use the same "cluster.name" setting throughout all nodes and clients. You 
> also can not reuse data from clusters that had other names (look into the 
> "data" folder)
>

I have now moved that code to a method called 
makeClientFromNamedClusterNode(). I haven't been able to make that work at 
all. 

To issue 4: ES takes ~5 secs for discovery, the zen modules pings and waits 
> for responding master nodes by default. If you just test locally on your 
> developer machine, you should disable zen. Most easily by disabling 
> networking at all, by using NodeBuilder.nodeBuilder().local(true)...
>

Not sure I understand what that means, but in any case, using a 
TransportClient seems to address that issue.

So, all in all, I feel I am in pretty good shape now. Thanks for the help.

There is one new Issue I an now encountering, when I try to do a bulk 
indexing, namely:

Issue 5: If I uncomment the call to index100NewBeersInOneBatch(), I get the 
following:

== Indexing 100 new beer objects as in one batch... (Elapsed so far: 1.0 
seconds)
Adding beer no 0 to the bulk request. (Elapsed so far: 1.0 seconds)
Exception in thread "main" java.lang.NullPointerException
at org.elasticsearch.action.bulk.BulkRequest.internalAdd(BulkRequest
.java:129)
at org.elasticsearch.action.bulk.BulkRequest.add(BulkRequest.java:
118)
at org.elasticsearch.action.bulk.BulkRequestBuilder.add(
BulkRequestBuilder.java:52)
at ca.nrc.ElasticSearch.ElasticSearchDemo.index100NewBeersInOneBatch
(ElasticSearchDemo.java:158)
at ca.nrc.ElasticSearch.ElasticSearchDemo.main(ElasticSearchDemo.
java:72)


Any thoughts on what might be going on there?

Thx.

Alain

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c3f6c329-4dc4-4714-8e6b-bae1b7fa1f51%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Need help with Java API

2014-07-21 Thread Alain Désilets
I am trying to get started with the Java API, using the excellent tutorial 
found here:

   http://www.slideshare.net/dadoonet/hands-on-lab-elasticsearch

But I am still having a lot of trouble.

Below is a sample of code that I have written:

package ca.nrc.ElasticSearch;

import org.codehaus.jackson.map.ObjectMapper;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.client.Client;
import org.elasticsearch.node.NodeBuilder;

public class ElasticSearchRunner {

static ObjectMapper mapper;
static Client client;
static String indexName = "meal5";
static String typeName = "beer";
static long startTimeMSecs;

public static void main(String[] args) throws Exception {
startTimeMSecs = System.currentTimeMillis(); 
mapper = new ObjectMapper(); // create once, reuse
 echo("Creating the ElasticSearch client..."); 
client = NodeBuilder.nodeBuilder().node().client(); // Does this create a 
brand new cluster?
// client = 
NodeBuilder.nodeBuilder().clusterName("handson").client(true).node().client(); 
// Joins existing cluster called "handson"
echo("DONE creating the ElasticSearch client... Elapsed time = 
"+elapsedSecs()+" secs."); 
 echo("Creating a beer object...");
Beer beer = new Beer("Heineken", Colour.PALE, 0.33, 3);
String jsonString = mapper.writeValueAsString(beer);
echo("DONE Creating a beer object...");

echo("Indexing the beer object...");
IndexResponse ir = null;
ir = client.prepareIndex(indexName, typeName).setSource(jsonString)
.execute().actionGet();
echo("DONE Indexing the beer object...");

echo("Retrieving the beer object...");
GetResponse gr = null;
gr = client.prepareGet(indexName, typeName, ir.getId()).execute()
.actionGet();
echo("DONE Retrieving the beer object..."); 
}

 public static float elapsedSecs() {
float elapsed = (System.currentTimeMillis() - startTimeMSecs)/1000;
return elapsed;
}
 public static void echo(String mess) {
mess = mess + " (Elapsed so far: "+elapsedSecs()+" seconds)";
System.out.println(mess);
}
}


It works, "sort of"...

If I use the first method for creating the client:

client = NodeBuilder.nodeBuilder().node().client();

Then it works fin the first time I run it. However:

*** ISSUE 1: If I try to inspect the meal index with Marvel, I don't find 
it.

Also, 

*** ISSUE 2: If I run the application a second time, I get the following 
output:

Creating the ElasticSearch client... (Elapsed so far: 0.0 seconds)
log4j:WARN No appenders could be found for logger (org.elasticsearch.node).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for 
more info.
DONE creating the ElasticSearch client... Elapsed time = 9.0 secs. (Elapsed 
so far: 9.0 seconds)
Creating a beer object... (Elapsed so far: 9.0 seconds)
DONE Creating a beer object... (Elapsed so far: 9.0 seconds)
Indexing the beer object... (Elapsed so far: 9.0 seconds)
Exception in thread "main" 
org.elasticsearch.action.UnavailableShardsException: [meal5][0] [2] 
shardIt, [0] active : Timeout waiting for [1m], request: index 
{[meal5][beer][B3F5ZEmSTruqdnlxhYviFg], 
source[{"brand":"Heineken","colour":"PALE","size":0.33,"price":3.0}]}
at 
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.raiseTimeoutFailure(TransportShardReplicationOperationAction.java:526)
at 
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$3.onTimeout(TransportShardReplicationOperationAction.java:516)
at 
org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:239)
at 
org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(InternalClusterService.java:494)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)


For me to be able to run the application again, I have to change the 
 indexName variable to a different value (ex: "meal6").

Also: 

*** ISSUE 3: If I try using the second method for creating a Client, then 
it doesn't work. More specifically, if I use this method:

client = 
NodeBuilder.nodeBuilder().clusterName("handson").client(true).node().client(); 

This yields:

Creating the ElasticSearch client... (Elapsed so far: 0.0 seconds)
log4j:WARN No appenders could be found for logger (org.elasticsearch.node).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for 
more info.
DONE creating the ElasticSearch client... Elapsed time = 34.0 secs. 
(Elapsed so far: 34.0 seconds)
Creating a beer object... (Elapsed so far: 34.0 seconds)
DONE Creating a beer object... (Elapsed so far: 34.0 seconds)
Indexing the beer object... (Elapsed so far: 34.0 seconds)
Exception in thread "main"

Re: Tutorial on Java interface to ElasticSearch?

2014-07-03 Thread Alain Désilets
Thanks David. This is exactly what I was looking for.

Alain

On Thursday, 3 July 2014 10:10:31 UTC-4, David Pilato wrote:
>
> Hey Alain,
>
>
> You can have a look at this: 
> https://github.com/elasticsearchfr/hands-on/tree/answers
>
> http://www.elasticsearch.org/guide/en/elasticsearch/client/java-api/current/index.html
>
> I hope this could help.
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
> @dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr 
> <https://twitter.com/elasticsearchfr>
>
>
> Le 3 juillet 2014 à 16:08:15, Alain Désilets (alainde...@gmail.com 
> ) a écrit:
>
> I am a total newbie to ElasticSearch, and I have been using the following 
> tutorial to get started: 
>
>  
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/getting-started.html
>  
> I'm finding it really easy to follow and informative.
>
> Now, I want to use the Java API to write a Java program that uses 
> ElasticSearch, but I can't find an equally helpful tutorial for the Java 
> API. I am surfing and searching through the API documentation, but can't 
> figure out how to do even easy operations like: create an index, put a 
> document into it, retrieve it, etc...
>
> Of course, I could always write methods that send HTTP requests, at which 
> point I could rely on what I learned through the above tutorial. But I 
> would prefer to use a more direct link provided by the Java API.
>
> Does someone know of a good tutorial on the Java API?
>
> Thx.
>
> Alain  Désilets
>  --
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/b0723c1e-8f14-4df2-a80e-a4a4646503a0%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/elasticsearch/b0723c1e-8f14-4df2-a80e-a4a4646503a0%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9732af28-e9a7-4bf5-92e2-7d70bfbad26a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Tutorial on Java interface to ElasticSearch?

2014-07-03 Thread Alain Désilets
I am a total newbie to ElasticSearch, and I have been using the following 
tutorial to get started:

http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/getting-started.html

I'm finding it really easy to follow and informative.

Now, I want to use the Java API to write a Java program that uses 
ElasticSearch, but I can't find an equally helpful tutorial for the Java 
API. I am surfing and searching through the API documentation, but can't 
figure out how to do even easy operations like: create an index, put a 
document into it, retrieve it, etc...

Of course, I could always write methods that send HTTP requests, at which 
point I could rely on what I learned through the above tutorial. But I 
would prefer to use a more direct link provided by the Java API.

Does someone know of a good tutorial on the Java API?

Thx.

Alain  Désilets

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b0723c1e-8f14-4df2-a80e-a4a4646503a0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Error trying to use FSRiver

2014-07-03 Thread Alain Désilets


On Tuesday, 1 July 2014 15:13:34 UTC-4, David Pilato wrote:
>
> Sadly, I can't make any promise here. I think it will be before the end of 
> July though.
>

I understand. Thanks.

Alain

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b5ce2b45-e75c-4289-8f20-c62c6b55a0ec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Error trying to use FSRiver

2014-07-01 Thread Alain Désilets
OK, thx. Any timeline for when the upgrade will be available?

Alain

On Tuesday, 1 July 2014 14:57:54 UTC-4, David Pilato wrote:
>
> Yeah. Still on my TODO list to upgrade it!
>
> :)
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
>
> Le 1 juil. 2014 à 20:24, Alain Désilets  > a écrit :
>
> I am trying to use the FSRiver with elastic-1.2.1.
>
> I installed the plugin as follows:
>
>   bin\plugin -install fr.pilato.elasticsearch.river/fsriver/1.0.0
>
> I then tried to restart ES by doing:
>   bin\service.bat manager
>   Stop
>   Start
>
> But afterwards, ES still has "Service status: stopped". If I look in the 
> log file, I see the following error:
>
> [2014-07-01 14:19:51,072][ERROR][bootstrap] {1.2.1}: 
> Initialization Failed ...
> - ExecutionError[java.lang.NoClassDefFoundError: 
> org/elasticsearch/rest/XContentThrowableRestResponse]
> NoClassDefFoundError[org/elasticsearch/rest/XContentThrowableRestResponse]
>
> ClassNotFoundException[org.elasticsearch.rest.XContentThrowableRestResponse]
>
> I searched for this error and didn't find anything. 
>
> Can anyone see what I might be doing wrong here? 
>
> Is this a compatibility issue? If I look on the FSRiver page:
>
>   http://www.pilato.fr/fsriver/
>
> I see that it only mentions ES 1.0.x and I am using 1.2.1.
>
> Thx.
>
> Alain
>
>  -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/a545a6b0-168c-4679-b20d-551f8defa555%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/elasticsearch/a545a6b0-168c-4679-b20d-551f8defa555%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/508b2fc6-9fda-4ed2-be0f-cd06d80dfc5e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Error trying to use FSRiver

2014-07-01 Thread Alain Désilets
I am trying to use the FSRiver with elastic-1.2.1.

I installed the plugin as follows:

  bin\plugin -install fr.pilato.elasticsearch.river/fsriver/1.0.0

I then tried to restart ES by doing:
  bin\service.bat manager
  Stop
  Start

But afterwards, ES still has "Service status: stopped". If I look in the 
log file, I see the following error:

[2014-07-01 14:19:51,072][ERROR][bootstrap] {1.2.1}: 
Initialization Failed ...
- ExecutionError[java.lang.NoClassDefFoundError: 
org/elasticsearch/rest/XContentThrowableRestResponse]
NoClassDefFoundError[org/elasticsearch/rest/XContentThrowableRestResponse]
ClassNotFoundException[org.elasticsearch.rest.XContentThrowableRestResponse]

I searched for this error and didn't find anything. 

Can anyone see what I might be doing wrong here? 

Is this a compatibility issue? If I look on the FSRiver page:

  http://www.pilato.fr/fsriver/

I see that it only mentions ES 1.0.x and I am using 1.2.1.

Thx.

Alain

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a545a6b0-168c-4679-b20d-551f8defa555%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How to do MoreLikeTHESE in Elastic Search?

2014-06-09 Thread Alain Désilets
I just LOVE the MoreLikThis (MLT) feature of ElasticSearch and am finding 
new and interesting ways of using it all the time.

However these days, I find myself needing a MoreLikeTHESE feature. In other 
words, given a group of documents that fit together, I want to find other 
documents that might belong in that group.

Is there a "native" way of doing that with the ES API? For example, is it 
possible to do MoreLikeThis on an Aggregation? I looked in the API 
documentation and didn't find anything.

If this is not supported natively, what would be the "best" way to 
implement this using the existing blocks? I can think of at least two:

== Approach 1: Concatenated Pseudo-Document ==

Create a pseudo-document whose content is the concatenation of the content 
of all documents in the group. Add that "document" to the index, then do 
the regular MoreLikeThis on that document.

The disadvantage of this approach is that the Pseudo-Document could become 
very large.

== Approach 2: Multiterm Vector Pseudo-Document ==

Retrieve the multiterms of all the documents in the group. Create a pseudo 
document that contains only the most frequent or most "important" 
multi-terms.

Do the regular MoreLikeThis on this pseudo-document.




-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d7d5e14d-6a02-4d60-b797-351337b14e77%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.