More like this scoring algorithm unclear

2014-01-08 Thread Maarten Roosendaal
Hi,

I have a question about why the 'more like this' algorithm scores documents 
higher than others, while they are (at first glance) the same.

What i've done is index wishlist-documents which contain 1 property: 
product_id, this property contains an array of product_id's (e.g. [1234, 
, , ]. What i'm trying to do is find similair wishlist for a 
given wishlist with id x. The MLT API seems to work, it returns other 
documents which contain at least 1 of the product_id's from the original 
list.

But what is see is that, for example. i get 10 hits, the first 6 hits 
contain the same (and only 1) product_id, this product_id is present in the 
original wishlist. What i would expect is that the score of the first 6 is 
the same. However what i see is that only the first 2 have the same, the 
next 2 a lower score and the next 2 even lower. Why is this?

Also, i'm trying to write the MLT API as an MLT query, but somehow it 
doesn't work. I would expect that i need to take the entire content of the 
original product_id property and feed is as input for the 'like_text'. The 
documentation is not very clear and doesn't provide examples so i'm a 
little lost.

Hope someone can give some pointers.

Thanks,
Maarten

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0e2827b2-5a21-4cff-b773-ebdd861c5972%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: How many metadata fields exist of MP3 file ?

2014-01-08 Thread David Pilato
Have a look at 
https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L376

You will see that mapper attachment reads:

Metadata.DATE
Metadata.TITLE
Metadata.AUTHOR
Metadata.KEYWORDS
Metadata.CONTENT_TYPE
Metadata.CONTENT_LENGTH

Does it help?

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 8 janvier 2014 at 05:05:10, HongXuan Ji (hxua...@gmail.com) a écrit:

Hi all,

I am wondering how many metadata fields of MP3 files exist when I post the mp3 
file into ElasticSearch using the mapper-attachment. 

Because in Solr we can know the field information through the endpoint 
SOLR_HOST/update/extract?extractOnly=true, 

but in ElasticSearch are there any ways to get such informations?  Except for 
the MP3 files, how about the doc files? 

I know the ElasticSearch use tika to support this operations, can you give me 
some example to fetch some special field of some special file format?

Regards,

Ivan 


--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/742f86b9-9dd8-4354-ae50-26332f0c4dc0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.52cd0d7f.2eb141f2.1449b%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.


Re: ElasticsearchHadoop Hive integration issue

2014-01-08 Thread Badal Mohapatra
Hi Costin,

   Thanks for your kind reply.
After specifying the type in es.resource I am now able to index.

I am using M1, will try with master once indexing is done.

Regards,
Badal


On Tuesday, 7 January 2014 16:21:01 UTC+5:30, Costin Leau wrote:

 Hi, 

 The 'es.resource' you specified is incorrect - you need to specify both an 
 index and a type - e.g. myIndex/products 


 P.S. Are you using M1 or the current master - the latter should give a 
 proper error (and message). 

 Thanks, 


 On 07/01/2014 9:48 AM, Badal Mohapatra wrote: 
  Hi, 
  
  I am trying to index data from hive table to elasticsearch and and 
 using the latest elasticsearch-hadoop-master plugin. 
  My elasticsearch version is 0.90.9 and hive version is hive-0.11.0. 
  
  As per the documentation of elasticsearch-hadoop plugin (hive 
 integration), I successfully created an external table 
  with the below command 
  
  /CREATE EXTERNAL TABLE es_products ( 
  sku int,rating float, 
  name string, 
  type string, 
  saleprice float, 
  department string, 
  manufacturer string, 
  userid string, 
  category_name string, 
  query string) 
  STORED BY 'org.elasticsearch.hadoop.hive.ESStorageHandler' 
  TBLPROPERTIES('es.resource' ='products');/ 
  
  Even though the external table is created 
  I am not able to either insert data or even query the external table. 
  When I do a /select * from es_products;/ 
  I get the below exception. 
  
  hive select * from es_products; 
  OK 
  Failed with exception 
 java.io.IOException:java.lang.StringIndexOutOfBoundsException: String index 
 out of range: -1 
  Time taken: 1.699 seconds 
  
  
  Can someone please suggest what / where I am wrong! 
  
  Kind Regards, 
  Badal 
  
  
  
  -- 
  You received this message because you are subscribed to the Google 
 Groups elasticsearch group. 
  To unsubscribe from this group and stop receiving emails from it, send 
 an email to 
  elasticsearc...@googlegroups.com javascript:. 
  To view this discussion on the web visit 
  
 https://groups.google.com/d/msgid/elasticsearch/dd63310c-dc07-4dc6-9354-69051a05da3f%40googlegroups.com.
  

  For more options, visit https://groups.google.com/groups/opt_out. 
 -- 
 Costin 


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fdcdffc0-bffc-45b3-96dd-30e894c68677%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: How to query custom rest handler in elastic search using Java api

2014-01-08 Thread Shishir Kumar
Hi,

I am not facing any issue with the NodesInfoAction or the custom endpoint 
code. The rest endpoint is working fine, if I curl to it using:curl -XGET '
localhost:9200/_mastering/nodes?prettyhttp://10.114.24.132:9200/_mastering/nodes?pretty
'.

I am trying to find out a way to do this from an embedded node. In other 
words, somthing like below:

Node node = 
NodeBuilder.nodeBuilder().clusterName(elasticsearch).node();
Client client = node.client();
SearchResponse response = 
client.prepareSearch().setSearchType(/_mastering/nodes).
setQuery(QueryBuilders.queryString()).
execute().actionGet();

P.S. the code snippet doesn't actually work. But I want to query the 
/_mastering/nodes through Java api.

On Friday, 3 January 2014 13:35:13 UTC+5:30, Jörg Prante wrote:

 You have wrapped a NodesInfoAction, so all you have to do is

 NodesInfoResponse response = client.admin().cluster().
 prepareNodesInfo().all().execute().actionGet();

 That is the Java API.

 Jörg



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bc517dbc-8dfc-4b8e-a9b3-567d87cc734c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Replicating one cluster to another cluster

2014-01-08 Thread joergpra...@gmail.com
First and most important, the good news: ES 1.0.0.Beta2 has
snapshot/restore feature in place so it should be easy to snapshot and
restore the result back to a target cluster. The snapshots are also
incremental.

Second, there are also news for the knapsack plugin.

In the next knapsack plugin version due this week, a full copy from
cluster1 to cluster2 will be as simple as

curl -XPOST 'http://cluster1node:port1
/_export/copy?cluster=cluster2namehost=cluster2nodeport=port2'

Limitations will be that you have knapsack plugin installed at
cluster1node, the same JVM version in cluster1 and cluster2, same ES
version in cluster1 and cluster2, and all your indexes have stored fields,
preferably the _source field. Also, cluster1 must not modify the indexes
while the _export/copy is running, or cluster2 may have different data
(there is no inherent locking).

In the new knapsack export version, you will be able to use arbitrary ES
queries to select subsets of the cluster data to copy, so only the hits of
a query can be transferred.

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHn%2BLA-BHeZTzxr6C2w4g7ULqWLHpr6gw6zstWptmDt4g%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Unique Count in aggregations

2014-01-08 Thread Vaidik Kapoor
Haven't tried the aggregations module. But if what you want are unique
terms, I think you can do that using Term Facets as well. Also, in case you
use Terms Facet, you will have to select such a size that will ensure that
ES returns all the terms and does not discard some when size is lesser than
the number of unique terms.

Vaidik Kapoor
vaidikkapoor.info


On 8 January 2014 14:41, Konstantinos Zacharakis kzach...@gmail.com wrote:

 Hello,

 I would like to ask about the support of unique terms in aggregations.
 Shay had mentioned in the issue 
 #1044https://github.com/elasticsearch/elasticsearch/issues/1044 that
 once the aggregation framework is done you plan to add this new feature?
 Since aggregations are here since Beta2, how close in your roadmap is the
 unique terms support? Should we expect that in 1.0.0 Release?

 Kind Regards
 Kostas


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/24ca975a-55f4-41ee-8d65-b9e65642eb74%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CACWtv5neMVd4Cgm9g9W3ePvNXg%3DhRHrzoE5mj_QaT8CTEXVByw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Unique Count in aggregations

2014-01-08 Thread Konstantinos Zacharakis
Hi Vaidik,

This method is fine when the term cardinality is low and can be also 
achieved using the aggregations framework.
However when cardinality is high, the memory footprint will be also high 
and for sure not so safe.

On Wednesday, 8 January 2014 11:15:22 UTC+2, Vaidik Kapoor wrote:

 Haven't tried the aggregations module. But if what you want are unique 
 terms, I think you can do that using Term Facets as well. Also, in case you 
 use Terms Facet, you will have to select such a size that will ensure that 
 ES returns all the terms and does not discard some when size is lesser than 
 the number of unique terms.

 Vaidik Kapoor
 vaidikkapoor.info


 On 8 January 2014 14:41, Konstantinos Zacharakis 
 kzac...@gmail.comjavascript:
  wrote:

 Hello,

 I would like to ask about the support of unique terms in aggregations.
 Shay had mentioned in the issue 
 #1044https://github.com/elasticsearch/elasticsearch/issues/1044 that 
 once the aggregation framework is done you plan to add this new feature?
 Since aggregations are here since Beta2, how close in your roadmap is the 
 unique terms support? Should we expect that in 1.0.0 Release?

 Kind Regards
 Kostas


  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/24ca975a-55f4-41ee-8d65-b9e65642eb74%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3802839d-ac98-4557-abf9-9509280f9c57%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


cassandra river plugin installation issue

2014-01-08 Thread shamsul haque
I have downloaded river from: https://github.com/eBay/cassandra-river

change the settings in file: CassandraRiver.java as per my Cassandra 
setting:

if (riverSettings.settings().containsKey(cassandra)) {
@SuppressWarnings(unchecked)
MapString, Object couchSettings = (MapString, Object) 
settings.settings().get(cassandra);
this.clusterName = 
XContentMapValues.nodeStringValue(couchSettings.get(cluster_name), Test 
Cluster);
this.keyspace = 
XContentMapValues.nodeStringValue(couchSettings.get(keyspace), 
topic_space);
this.columnFamily = 
XContentMapValues.nodeStringValue(couchSettings.get(column_family), 
users);
this.batchSize = 
XContentMapValues.nodeIntegerValue(couchSettings.get(batch_size), 1000);
this.hosts = 
XContentMapValues.nodeStringValue(couchSettings.get(hosts), 
localhost:9160);
this.username = 
XContentMapValues.nodeStringValue(couchSettings.get(username), 
USERNAME);
this.password = 
XContentMapValues.nodeStringValue(couchSettings.get(password), P$$WD);
} else {
/*
 * Set default values
 */
this.clusterName = Test Cluster;
this.keyspace = topic_space;
this.columnFamily = users;
this.batchSize = 1000;
this.hosts = localhost:9160;
this.username = USERNAME;
this.password = P$$WD;
}

when i build maven using given command, mvn clean package in TEST mvn log 
it shows:

---
 T E S T S
---
Running org.elasticsearch.river.cassandra.CassandraRiverIntegrationTest
Configuring TestNG with: 
org.apache.maven.surefire.testng.conf.TestNG652Configurator@67eaf25d
Exception in thread Queue-Indexer-thread-0 java.lang.NullPointerException
at 
org.elasticsearch.river.cassandra.CassandraRiver$Indexer.run(CassandraRiver.java:149)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Exception in thread Queue-Indexer-thread-2 java.lang.NullPointerException
at 
org.elasticsearch.river.cassandra.CassandraRiver$Indexer.run(CassandraRiver.java:149)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Exception in thread Queue-Indexer-thread-5 java.lang.NullPointerException
at 
org.elasticsearch.river.cassandra.CassandraRiver$Indexer.run(CassandraRiver.java:149)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Exception in thread Queue-Indexer-thread-4 java.lang.NullPointerException

i tried to do same after installing plugin in ES, it shows same error 
continuously.
Anybody have any idea, whats going wrong with my setup??


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ef16f8fa-3145-43be-87ce-e8f53060938f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: cassandra river plugin installation issue

2014-01-08 Thread shamsul haque
CassandraRiver.java:149 contains:
logger.info(Starting thread with {} keys, this.keys.rowColumnMap.size());
where rowColumnMap is a map, and may be empty thats why this error comes

And at first i build that river module normally and install it as a plugin 
in ES.
But when i ran script:
curl -XPUT 'localhost:9200/_river/userinfo/_meta' -d '{
type : cassandra,
cassandra : {
cluster_name : Test Cluster,
keyspace : topic_space,
column_family : users,
batch_size : 100,
hosts : localhost:9160
},
index : {
index : userinfo,
type : users
}
}'


same error comes in ES console, same as i have copied from maven console. 
and it also not fetching data from cassandra to ES. 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0bc5a111-0562-46ff-a640-9b9241638887%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: How many metadata fields exist of MP3 file ?

2014-01-08 Thread David Pilato
I would recommend not to use the mapper attachment but to manage that on your 
side.
I removed for example mapper attachment from fsriver project to have a finer 
control. (see https://github.com/dadoonet/fsriver/issues/38)

BTW, I'm not aware on how you can get ALBUM field using Tika. Any pointer? 
Could be nice to add it to fsriver as well.

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 8 janvier 2014 at 10:49:47, HongXuan Ji (hxua...@gmail.com) a écrit:

Thanks for the reply.

Except for the six standard fields, I also want to know the extra field. For 
example, in Solr we can extract the album field in MP3 file.
Does this function also support in ElasticSearch? I just tested: I post a mp3 
file into ES, but the fields of the mp3 file contains only the six fields.

Ideas?

Thanks a lot.

David Pilato於 2014年1月8日星期三UTC+8下午4時34分07秒寫道:
Have a look at 
https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L376

You will see that mapper attachment reads:


Metadata.DATE

Metadata.TITLE

Metadata.AUTHOR

Metadata.KEYWORDS

Metadata.CONTENT_TYPE

Metadata.CONTENT_LENGTH


Does it help?

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 8 janvier 2014 at 05:05:10, HongXuan Ji (hxu...@gmail.com) a écrit:

Hi all,

I am wondering how many metadata fields of MP3 files exist when I post the mp3 
file into ElasticSearch using the mapper-attachment. 

Because in Solr we can know the field information through the endpoint 
SOLR_HOST/update/extract?extractOnly=true, 

but in ElasticSearch are there any ways to get such informations?  Except for 
the MP3 files, how about the doc files? 

I know the ElasticSearch use tika to support this operations, can you give me 
some example to fetch some special field of some special file format?

Regards,

Ivan 


--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearc...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/742f86b9-9dd8-4354-ae50-26332f0c4dc0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4495d489-6a3f-4b57-95a2-eefccbe48cf7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.52cd2707.1190cde7.1449b%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.


Re: cassandra river plugin installation issue

2014-01-08 Thread shamsul haque
1 change which i have made in that cassandra-river project is to change the 
casandra jar version from 1.3 to 2.0.3 in pox.xml as i am using Cassandra 
2.0.4
Any idea whats going wrong?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3f8867fd-5f92-47a5-bf6b-5f4b2f5306ee%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: cassandra river plugin installation issue

2014-01-08 Thread David Pilato
So probably

CassandraCFData cassandraData = db.getCFData(columnFamily, start, 1000);

did not get any data from Cassandra?


Never played with this plugin either Cassandra so I'm afraid I can't help more 
here!


-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 8 janvier 2014 at 11:21:26, shamsul haque (shams...@gmail.com) a écrit:

CassandraRiver.java:149 contains:
logger.info(Starting thread with {} keys, this.keys.rowColumnMap.size());
where rowColumnMap is a map, and may be empty thats why this error comes

And at first i build that river module normally and install it as a plugin in 
ES.
But when i ran script:
curl -XPUT 'localhost:9200/_river/userinfo/_meta' -d '{
    type : cassandra,
    cassandra : {
cluster_name : Test Cluster,
        keyspace : topic_space,
        column_family : users,
        batch_size : 100,
        hosts : localhost:9160
    },
    index : {
        index : userinfo,
        type : users
    }
}'


same error comes in ES console, same as i have copied from maven console. 
and it also not fetching data from cassandra to ES. 
--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0bc5a111-0562-46ff-a640-9b9241638887%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.52cd2806.109cf92e.1449b%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.


Re: cassandra river plugin installation issue

2014-01-08 Thread shamsul haque
ok, Thanks for pointing this.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/69213f6c-0e23-4a8d-bbf8-f9423d7200b3%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks

2014-01-08 Thread xjj210130
Dear all:
   I insert 1 logs to elasticsearch, each log is about 2M, and 
there are about 3000 keys and values.
 when i insert about 2, it used about 30G memory, and then 
elasticsearch is very slow, and it's hard to insert log.
 Could someone help me how to solve it? Thanks very much.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f57de01f-7c63-4d88-9bcc-80daf7cc6a1d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Order results by value in one of the array entries.

2014-01-08 Thread Johan E
Hi Jun,

Thanks for your reply.

Im not sure how I can get that to work. In my project I need to only 
boost/order by the stock of warehouse_a, how do I use only the value for 
that entry in the array?

Thanks
Johan

On Wednesday, January 8, 2014 4:35:50 AM UTC, Jun Ohtani wrote:

 Hi Johan, 

 You try to use script based sorting. 

 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-sort.html#_script_based_sorting
  

 Or the function score query. 

 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#_script_score
  

 I hope this helps. 

 Regards, 

  
 Jun Ohtani 
 joh...@gmail.com javascript: 
 blog : http://blog.johtani.info 
 twitter : http://twitter.com/johtani 




 2014/01/07 19:45、Johan E joha...@gmail.com javascript: のメール: 

  Hi, 
  
  I'm trying to order the result of a query by a specified entry in a 
 array. 
  
  Here is a sample entry 
  
  
  { 
  product_name: product alfa, 
  product_id: 4a86c92ccd26111d7ba0eada7da6a75af, 
  description: This is a sample product, 
  image_id: product_a.jpg, 
  inventory: [ 
  { 
  warehouse: warehouse_a, 
  stock: 99 
  }, 
  { 
  warehouse: warehouse_b, 
  stock: 19 
  }, 
  { 
  warehouse: warehouse_c, 
  stock: 99 
  } 
  ] 
  } 
  
  If there were more products containing alfa, I would (for example) 
 want to sort they by the stock of a warehouse. 
  
  I'm currently using a query like: 
  
  POST _search 
  { 
  query: { 
  match: { 
  product_name:{ 
  query:alfa, 
  type : phrase 
  } 
  } 
  }, 
  filter: { 
  bool: { 
  must: [ 
 { 
 term: { 
availability.warehouse: warehouse_a 
 } 
 } 
  ] 
  } 
  } 
  } 
  
  I would like the results sorted by stock (for warehouse_a only) 
 descending. 
  
  Any ideas? 
  
  
  -- 
  You received this message because you are subscribed to the Google 
 Groups elasticsearch group. 
  To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com javascript:. 
  To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/01a7baad-40e3-40b3-8104-66910762b004%40googlegroups.com.
  

  For more options, visit https://groups.google.com/groups/opt_out. 



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0eb410ef-0117-4004-84f0-713b5b02616f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Is it possible to do a text query against a (pre)defined set of fields?

2014-01-08 Thread Ville Mattila
Hi,

I am indexing some large documents in an index. When making full text 
queries, I've generally used {text: {_all: some text search}} to find 
all possible results. However, the document contains a few private fields 
that should be queryable only by a certain user group.

What I was wondering is if there is a possibility to define some kind of 
alias for a set of fields (or even better - all fields except the set of 
fields) in the mapping definition. I could then do a query {text: 
{alias_for_public_fields: some text search}} while the private fields 
would not be searched for. I do not know if this is possible already now?

I know that it's possible to list all fields in the query and leave out the 
privates, but as there can be hundreds of fields that should be queryable 
but only 2-3 private fields, listing fields explicitly adds significant 
overhead to the queries.

Best regards,
Ville

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8dc6c390-b483-40d0-bc4e-264380743aef%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: How many metadata fields exist of MP3 file ?

2014-01-08 Thread HongXuan Ji
Hi David,

I only got the ALBUM field by using the endpoint of Solr, which is 
HOST/solr/update/extract?extractOnly=true.
So it seems the mapper attachment does not support the extra field 
extraction. right?

BTW, can you give me some tutorial about the fsriver? I am also curious 
what's the plugin for ? What's the purpose of the plugin?

Best,

Ivan

David Pilato於 2014年1月8日星期三UTC+8下午6時23分03秒寫道:

 I would recommend not to use the mapper attachment but to manage that on 
 your side.
 I removed for example mapper attachment from fsriver project to have a 
 finer control. (see https://github.com/dadoonet/fsriver/issues/38)

 BTW, I'm not aware on how you can get ALBUM field using Tika. Any pointer? 
 Could be nice to add it to fsriver as well.

 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | 
 @elasticsearchfrhttps://twitter.com/elasticsearchfr


 Le 8 janvier 2014 at 10:49:47, HongXuan Ji (hxu...@gmail.com javascript:) 
 a écrit:

 Thanks for the reply. 

 Except for the six standard fields, I also want to know the extra field. 
 For example, in Solr we can extract the album field in MP3 file.
 Does this function also support in ElasticSearch? I just tested: I post a 
 mp3 file into ES, but the fields of the mp3 file contains only the six 
 fields.

 Ideas?

 Thanks a lot.

 David Pilato於 2014年1月8日星期三UTC+8下午4時34分07秒寫道: 

  Have a look at 
 https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L376
  
  You will see that mapper attachment reads:
  
  Metadata.DATE
  Metadata.TITLE
  Metadata.AUTHOR
  Metadata.KEYWORDS
  Metadata.CONTENT_TYPE
  Metadata.CONTENT_LENGTH
  
  Does it help?

  -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com* 
  @dadoonet https://twitter.com/dadoonet | 
 @elasticsearchfrhttps://twitter.com/elasticsearchfr
  

 Le 8 janvier 2014 at 05:05:10, HongXuan Ji (hxu...@gmail.com) a écrit:

  Hi all, 

 I am wondering how many metadata fields of MP3 files exist when I post 
 the mp3 file into ElasticSearch using the mapper-attachment. 

 Because in Solr we can know the field information through the endpoint 
 SOLR_HOST/update/extract?extractOnly=true, 

 but in ElasticSearch are there any ways to get such informations?  Except 
 for the MP3 files, how about the doc files? 

 I know the ElasticSearch use tika to support this operations, can you 
 give me some example to fetch some special field of some special file 
 format?

 Regards,

 Ivan 


  --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/742f86b9-9dd8-4354-ae50-26332f0c4dc0%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.
  
   --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/4495d489-6a3f-4b57-95a2-eefccbe48cf7%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/00fe2081-0f22-400f-a0be-78ee5687ee10%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks

2014-01-08 Thread xjj210130


On Wednesday, January 8, 2014 6:58:02 PM UTC+8, xjj2...@gmail.com wrote:

 Dear all:
I insert 1 logs to elasticsearch, each log is about 2M, and 
 there are about 3000 keys and values.
  when i insert about 2, it used about 30G memory, and then 
 elasticsearch is very slow, and it's hard to insert log.
  Could someone help me how to solve it? Thanks very much.



The following is my log format:
{
  
user1:{{costprice:122},{sellprice:124},{stock:12},{sell:122},{},{}],


}
 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6d215d9b-a194-4a97-9ab5-081d7e8eb3ab%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks

2014-01-08 Thread xjj210130


On Wednesday, January 8, 2014 6:58:02 PM UTC+8, xjj2...@gmail.com wrote:

 Dear all:
I insert 1 logs to elasticsearch, each log is about 2M, and 
 there are about 3000 keys and values.
  when i insert about 2, it used about 30G memory, and then 
 elasticsearch is very slow, and it's hard to insert log.
  Could someone help me how to solve it? Thanks very much.


The following is my log format:
{
  
user1:{{costprice:122},{sellprice:124},{stock:12},{sell:122},{},{}],
.
  product:[{}],
  name:[]
} 

There are about 4000~ 1 users information, so a log may be 2M
 Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ac8cc57b-61ca-497e-9a27-4db8870f3916%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks

2014-01-08 Thread David Pilato
Do you insert that using bulk?

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 8 janvier 2014 at 12:29:33, xjj210...@gmail.com (xjj210...@gmail.com) a 
écrit:



On Wednesday, January 8, 2014 6:58:02 PM UTC+8, xjj2...@gmail.com wrote:
Dear all:
       I insert 1 logs to elasticsearch, each log is about 2M, and there 
are about 3000 keys and values.
 when i insert about 2, it used about 30G memory, and then elasticsearch is 
very slow, and it's hard to insert log.
 Could someone help me how to solve it? Thanks very much.

The following is my log format:
{
  
user1:{{costprice:122},{sellprice:124},{stock:12},{sell:122},{},{}],
.
  product:[{}],
  name:[]
} 

There are about 4000~ 1 users information, so a log may be 2M
 Thanks
--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ac8cc57b-61ca-497e-9a27-4db8870f3916%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.52cd36cd.333ab105.1449b%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.


Re: How many metadata fields exist of MP3 file ?

2014-01-08 Thread David Pilato
Mapper attachment does not support extra field extraction. May be you could 
open an issue there? 
https://github.com/elasticsearch/elasticsearch-mapper-attachments 

About FSRiver, I guess everything is described here: 
https://github.com/dadoonet/fsriver#filesystem-river-for-elasticsearch
Is there something you don't understand?


-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 8 janvier 2014 at 12:24:11, HongXuan Ji (hxua...@gmail.com) a écrit:

Hi David,

I only got the ALBUM field by using the endpoint of Solr, which is 
HOST/solr/update/extract?extractOnly=true.
So it seems the mapper attachment does not support the extra field extraction. 
right?

BTW, can you give me some tutorial about the fsriver? I am also curious what's 
the plugin for ? What's the purpose of the plugin?

Best,

Ivan

David Pilato於 2014年1月8日星期三UTC+8下午6時23分03秒寫道:
I would recommend not to use the mapper attachment but to manage that on your 
side.
I removed for example mapper attachment from fsriver project to have a finer 
control. (see https://github.com/dadoonet/fsriver/issues/38)

BTW, I'm not aware on how you can get ALBUM field using Tika. Any pointer? 
Could be nice to add it to fsriver as well.

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 8 janvier 2014 at 10:49:47, HongXuan Ji (hxu...@gmail.com) a écrit:

Thanks for the reply.

Except for the six standard fields, I also want to know the extra field. For 
example, in Solr we can extract the album field in MP3 file.
Does this function also support in ElasticSearch? I just tested: I post a mp3 
file into ES, but the fields of the mp3 file contains only the six fields.

Ideas?

Thanks a lot.

David Pilato於 2014年1月8日星期三UTC+8下午4時34分07秒寫道:
Have a look at 
https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L376

You will see that mapper attachment reads:


Metadata.DATE

Metadata.TITLE

Metadata.AUTHOR

Metadata.KEYWORDS

Metadata.CONTENT_TYPE

Metadata.CONTENT_LENGTH


Does it help?

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 8 janvier 2014 at 05:05:10, HongXuan Ji (hxu...@gmail.com) a écrit:

Hi all,

I am wondering how many metadata fields of MP3 files exist when I post the mp3 
file into ElasticSearch using the mapper-attachment. 

Because in Solr we can know the field information through the endpoint 
SOLR_HOST/update/extract?extractOnly=true, 

but in ElasticSearch are there any ways to get such informations?  Except for 
the MP3 files, how about the doc files? 

I know the ElasticSearch use tika to support this operations, can you give me 
some example to fetch some special field of some special file format?

Regards,

Ivan 


--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearc...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/742f86b9-9dd8-4354-ae50-26332f0c4dc0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearc...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4495d489-6a3f-4b57-95a2-eefccbe48cf7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/00fe2081-0f22-400f-a0be-78ee5687ee10%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.52cd3741.6763845e.1449b%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks

2014-01-08 Thread xjj210130


On Wednesday, January 8, 2014 7:30:21 PM UTC+8, David Pilato wrote:

 Do you insert that using bulk?

 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | 
 @elasticsearchfrhttps://twitter.com/elasticsearchfr


 Le 8 janvier 2014 at 12:29:33, xjj2...@gmail.com javascript: (
 xjj2...@gmail.com javascript:) a écrit:



 On Wednesday, January 8, 2014 6:58:02 PM UTC+8, xjj2...@gmail.com wrote: 

 Dear all: 
I insert 1 logs to elasticsearch, each log is about 2M, and 
 there are about 3000 keys and values.
  when i insert about 2, it used about 30G memory, and then 
 elasticsearch is very slow, and it's hard to insert log.
  Could someone help me how to solve it? Thanks very much.
  

 The following is my log format:
 {
   
 user1:{{costprice:122},{sellprice:124},{stock:12},{sell:122},{},{}],
 .
   product:[{}],
   name:[]
 } 

 There are about 4000~ 1 users information, so a log may be 2M
  Thanks
  --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/ac8cc57b-61ca-497e-9a27-4db8870f3916%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.

 No,i insert a log one by one, use thrift to transport the log . I set 
heap_size=30G, when i insert 2, it used 30g memory. I don't change the  
elasticsearch.yml 
except the heap_size ,and thrift.frame.(most of value i use the default 
value) Thanks, 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cc9fbb4a-5eb2-4e3d-afa8-7524165a4e31%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: How many metadata fields exist of MP3 file ?

2014-01-08 Thread HongXuan Ji
OK, I will post the issue later.

About the river, 

The first line: This river plugin helps to index documents from your local 
file system and using SSH. 

Does it means   I store a bunch of pdf file in my local directory and by 
using the river plugin I can search the file in the directory.  ?

In fact, I started to study ElasticSearch this week and I am not very 
familiar the filesystem means here.
Thanks a lot.

Ivan
David Pilato於 2014年1月8日星期三UTC+8下午7時32分17秒寫道:

 Mapper attachment does not support extra field extraction. May be you 
 could open an issue there? 
 https://github.com/elasticsearch/elasticsearch-mapper-attachments 

 About FSRiver, I guess everything is described here: 
 https://github.com/dadoonet/fsriver#filesystem-river-for-elasticsearch
 Is there something you don't understand?


 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | 
 @elasticsearchfrhttps://twitter.com/elasticsearchfr


 Le 8 janvier 2014 at 12:24:11, HongXuan Ji (hxu...@gmail.com javascript:) 
 a écrit:

 Hi David, 

 I only got the ALBUM field by using the endpoint of Solr, which is 
 HOST/solr/update/extract?extractOnly=true.
 So it seems the mapper attachment does not support the extra field 
 extraction. right?

 BTW, can you give me some tutorial about the fsriver? I am also curious 
 what's the plugin for ? What's the purpose of the plugin?

 Best,

 Ivan

 David Pilato於 2014年1月8日星期三UTC+8下午6時23分03秒寫道: 

  I would recommend not to use the mapper attachment but to manage that 
 on your side.
  I removed for example mapper attachment from fsriver project to have a 
 finer control. (see https://github.com/dadoonet/fsriver/issues/38)
  
  BTW, I'm not aware on how you can get ALBUM field using Tika. Any 
 pointer? Could be nice to add it to fsriver as well.

  -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com* 
  @dadoonet https://twitter.com/dadoonet | 
 @elasticsearchfrhttps://twitter.com/elasticsearchfr
  

 Le 8 janvier 2014 at 10:49:47, HongXuan Ji (hxu...@gmail.com) a écrit:

  Thanks for the reply. 

 Except for the six standard fields, I also want to know the extra field. 
 For example, in Solr we can extract the album field in MP3 file.
 Does this function also support in ElasticSearch? I just tested: I post a 
 mp3 file into ES, but the fields of the mp3 file contains only the six 
 fields.

 Ideas?

 Thanks a lot.

 David Pilato於 2014年1月8日星期三UTC+8下午4時34分07秒寫道: 

  Have a look at 
 https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L376
  
  You will see that mapper attachment reads:
  
  Metadata.DATE
  Metadata.TITLE
  Metadata.AUTHOR
  Metadata.KEYWORDS
  Metadata.CONTENT_TYPE
  Metadata.CONTENT_LENGTH
  
  Does it help?

  -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com* 
  @dadoonet https://twitter.com/dadoonet | 
 @elasticsearchfrhttps://twitter.com/elasticsearchfr
  

 Le 8 janvier 2014 at 05:05:10, HongXuan Ji (hxu...@gmail.com) a écrit:

  Hi all, 

 I am wondering how many metadata fields of MP3 files exist when I post 
 the mp3 file into ElasticSearch using the mapper-attachment. 

 Because in Solr we can know the field information through the endpoint 
 SOLR_HOST/update/extract?extractOnly=true, 

 but in ElasticSearch are there any ways to get such informations? 
  Except for the MP3 files, how about the doc files? 

 I know the ElasticSearch use tika to support this operations, can you 
 give me some example to fetch some special field of some special file 
 format?

 Regards,

 Ivan 


  --
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/742f86b9-9dd8-4354-ae50-26332f0c4dc0%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.
  
   --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/4495d489-6a3f-4b57-95a2-eefccbe48cf7%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.
  
   --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/00fe2081-0f22-400f-a0be-78ee5687ee10%40googlegroups.com
 .
 For more options, visit 

Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks

2014-01-08 Thread David Pilato
That was not really my question. Are you using BULK feature?

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 8 janvier 2014 at 12:38:00, xjj210...@gmail.com (xjj210...@gmail.com) a 
écrit:

I use the elasticsearch  version is 0.90.2

On Wednesday, January 8, 2014 7:30:21 PM UTC+8, David Pilato wrote:
Do you insert that using bulk?

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 8 janvier 2014 at 12:29:33, xjj2...@gmail.com (xjj2...@gmail.com) a écrit:



On Wednesday, January 8, 2014 6:58:02 PM UTC+8, xjj2...@gmail.com wrote:
Dear all:
       I insert 1 logs to elasticsearch, each log is about 2M, and there 
are about 3000 keys and values.
 when i insert about 2, it used about 30G memory, and then elasticsearch is 
very slow, and it's hard to insert log.
 Could someone help me how to solve it? Thanks very much.

The following is my log format:
{
  
user1:{{costprice:122},{sellprice:124},{stock:12},{sell:122},{},{}],
.
  product:[{}],
  name:[]
} 

There are about 4000~ 1 users information, so a log may be 2M
 Thanks
--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearc...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ac8cc57b-61ca-497e-9a27-4db8870f3916%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/66b722eb-0d8b-4bc9-88a2-f13ccd08a92b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.52cd39ee.4353d0cd.1449b%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.


Re: How many metadata fields exist of MP3 file ?

2014-01-08 Thread David Pilato
Yes. It index your documents available on your local hard drive.

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 8 janvier 2014 at 12:42:56, HongXuan Ji (hxua...@gmail.com) a écrit:

OK, I will post the issue later.

About the river, 

The first line: This river plugin helps to index documents from your local 
file system and using SSH. 

Does it means   I store a bunch of pdf file in my local directory and by 
using the river plugin I can search the file in the directory.  ?

In fact, I started to study ElasticSearch this week and I am not very familiar 
the filesystem means here.
Thanks a lot.

Ivan
David Pilato於 2014年1月8日星期三UTC+8下午7時32分17秒寫道:
Mapper attachment does not support extra field extraction. May be you could 
open an issue there? 
https://github.com/elasticsearch/elasticsearch-mapper-attachments 

About FSRiver, I guess everything is described here: 
https://github.com/dadoonet/fsriver#filesystem-river-for-elasticsearch
Is there something you don't understand?


-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 8 janvier 2014 at 12:24:11, HongXuan Ji (hxu...@gmail.com) a écrit:

Hi David,

I only got the ALBUM field by using the endpoint of Solr, which is 
HOST/solr/update/extract?extractOnly=true.
So it seems the mapper attachment does not support the extra field extraction. 
right?

BTW, can you give me some tutorial about the fsriver? I am also curious what's 
the plugin for ? What's the purpose of the plugin?

Best,

Ivan

David Pilato於 2014年1月8日星期三UTC+8下午6時23分03秒寫道:
I would recommend not to use the mapper attachment but to manage that on your 
side.
I removed for example mapper attachment from fsriver project to have a finer 
control. (see https://github.com/dadoonet/fsriver/issues/38)

BTW, I'm not aware on how you can get ALBUM field using Tika. Any pointer? 
Could be nice to add it to fsriver as well.

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 8 janvier 2014 at 10:49:47, HongXuan Ji (hxu...@gmail.com) a écrit:

Thanks for the reply.

Except for the six standard fields, I also want to know the extra field. For 
example, in Solr we can extract the album field in MP3 file.
Does this function also support in ElasticSearch? I just tested: I post a mp3 
file into ES, but the fields of the mp3 file contains only the six fields.

Ideas?

Thanks a lot.

David Pilato於 2014年1月8日星期三UTC+8下午4時34分07秒寫道:
Have a look at 
https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L376

You will see that mapper attachment reads:


Metadata.DATE

Metadata.TITLE

Metadata.AUTHOR

Metadata.KEYWORDS

Metadata.CONTENT_TYPE

Metadata.CONTENT_LENGTH


Does it help?

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 8 janvier 2014 at 05:05:10, HongXuan Ji (hxu...@gmail.com) a écrit:

Hi all,

I am wondering how many metadata fields of MP3 files exist when I post the mp3 
file into ElasticSearch using the mapper-attachment. 

Because in Solr we can know the field information through the endpoint 
SOLR_HOST/update/extract?extractOnly=true, 

but in ElasticSearch are there any ways to get such informations?  Except for 
the MP3 files, how about the doc files? 

I know the ElasticSearch use tika to support this operations, can you give me 
some example to fetch some special field of some special file format?

Regards,

Ivan 


--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearc...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/742f86b9-9dd8-4354-ae50-26332f0c4dc0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearc...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4495d489-6a3f-4b57-95a2-eefccbe48cf7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearc...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/00fe2081-0f22-400f-a0be-78ee5687ee10%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails 

Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks

2014-01-08 Thread xjj210130
no, i don't use bulk. You mean  i use bulk it maybe solve the problem?Thanks

On Wednesday, January 8, 2014 7:43:41 PM UTC+8, David Pilato wrote:

 That was not really my question. Are you using BULK feature?

 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | 
 @elasticsearchfrhttps://twitter.com/elasticsearchfr


 Le 8 janvier 2014 at 12:38:00, xjj2...@gmail.com javascript: (
 xjj2...@gmail.com javascript:) a écrit:

 I use the elasticsearch  version is 0.90.2

 On Wednesday, January 8, 2014 7:30:21 PM UTC+8, David Pilato wrote: 

  Do you insert that using bulk?

  -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com* 
  @dadoonet https://twitter.com/dadoonet | 
 @elasticsearchfrhttps://twitter.com/elasticsearchfr
  

 Le 8 janvier 2014 at 12:29:33, xjj2...@gmail.com (xjj2...@gmail.com) a 
 écrit:

  

 On Wednesday, January 8, 2014 6:58:02 PM UTC+8, xjj2...@gmail.com wrote: 

 Dear all: 
I insert 1 logs to elasticsearch, each log is about 2M, and 
 there are about 3000 keys and values.
  when i insert about 2, it used about 30G memory, and then 
 elasticsearch is very slow, and it's hard to insert log.
  Could someone help me how to solve it? Thanks very much.
  

 The following is my log format:
 {
   
 user1:{{costprice:122},{sellprice:124},{stock:12},{sell:122},{},{}],
 .
   product:[{}],
   name:[]
 } 

 There are about 4000~ 1 users information, so a log may be 2M
  Thanks
  --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/ac8cc57b-61ca-497e-9a27-4db8870f3916%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.
  
   --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/66b722eb-0d8b-4bc9-88a2-f13ccd08a92b%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ba188b4a-7f99-4194-8002-7a595821c141%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


memory surges in client app when a node dies

2014-01-08 Thread nicolas . long
Hi all,

I have a situation where if a node in our cluster dies (for whatever 
reason) the client app experiences a surge in memory usage, full GCs, and 
essentially dies.

I think this is because the client holds on to the connections for a whlie 
before realising the node is dead.

Does this sound possible? And does anyone have tips for how to deal with 
this. My thinking so far is:

1. More memory

2. A circuit-breaker pattern or some such to make sure the app disconnects 
quicker when ES is not responding

But are there ways to configure the ES client to improve the behaviour here?

Thanks,

Nic

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/66c393a3-91d9-4314-a38f-e5267390b9b7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks

2014-01-08 Thread xjj210130
I use the elasticsearch  version is 0.90.2

On Wednesday, January 8, 2014 7:30:21 PM UTC+8, David Pilato wrote:

 Do you insert that using bulk?

 -- 
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | 
 @elasticsearchfrhttps://twitter.com/elasticsearchfr


 Le 8 janvier 2014 at 12:29:33, xjj2...@gmail.com javascript: (
 xjj2...@gmail.com javascript:) a écrit:



 On Wednesday, January 8, 2014 6:58:02 PM UTC+8, xjj2...@gmail.com wrote: 

 Dear all: 
I insert 1 logs to elasticsearch, each log is about 2M, and 
 there are about 3000 keys and values.
  when i insert about 2, it used about 30G memory, and then 
 elasticsearch is very slow, and it's hard to insert log.
  Could someone help me how to solve it? Thanks very much.
  

 The following is my log format:
 {
   
 user1:{{costprice:122},{sellprice:124},{stock:12},{sell:122},{},{}],
 .
   product:[{}],
   name:[]
 } 

 There are about 4000~ 1 users information, so a log may be 2M
  Thanks
  --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/ac8cc57b-61ca-497e-9a27-4db8870f3916%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/66b722eb-0d8b-4bc9-88a2-f13ccd08a92b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks

2014-01-08 Thread xjj210130
I only insert the log to elasticsearch.   I will do the following wrok:
 1:  write the data to elasticsearch.
 2: Then to search the data.

Now, when i insert the data to es, It used too much memory. I wonder why 
the es use so much memory.
Could you give me some suggestions. Thanks

 I use jmap to watch the pid. the  result is following:(i change the 
heap_size 1G to watch the memory use)

num   #instances#bytes  Class description
--
1:  229353  18348240java.util.WeakHashMap$Entry[]
2:  229353  12843768java.util.WeakHashMap
3:  145045  8703384 org.elasticsearch.index.mapper.FieldMapper[]
4:  229353  7339296 java.lang.ref.ReferenceQueue
5:  235890  5661360 
org.elasticsearch.common.collect.RegularImmutableMap$TerminalEntry
6:  229346  5504304 org.apache.lucene.util.CloseableThreadLocal
7:  57303   4125816 
org.elasticsearch.index.mapper.core.LongFieldMapper
8:  85939   3836608 char[]
9:  155465  3731160 
org.elasticsearch.common.collect.RegularImmutableMap$NonTerminalEntry
10: 229353  3669648 java.lang.ThreadLocal
11: 229353  3669648 java.lang.ref.ReferenceQueue$Lock
12: 229353  3669648 java.util.concurrent.atomic.AtomicInteger
13: 114662  3669184 
org.elasticsearch.index.analysis.NamedAnalyzer
14: 28698   3518912 
org.elasticsearch.common.collect.RegularImmutableMap$LinkedEntry[]
15: 145044  3481056 java.util.Arrays$ArrayList
16: 145044  3481056 org.elasticsearch.index.mapper.FieldMappers
17: 114620  2750880 
org.elasticsearch.index.analysis.NumericLongAnalyzer
18: 52044   2081760 org.apache.lucene.document.FieldType
19: 85939   2062536 java.lang.String
20: 57499   1839968 
org.elasticsearch.index.mapper.FieldMapper$Names
21: 114683  1834928 
org.apache.lucene.analysis.Analyzer$PerFieldReuseStrategy
22: 114662  1834592 
org.apache.lucene.analysis.Analyzer$GlobalReuseStrategy
23: 57493   1379832 
org.elasticsearch.index.fielddata.FieldDataType
24: 57332   1375968 
org.elasticsearch.index.mapper.core.NumberFieldMapper$1
25: 57303   1375272 org.elasticsearch.common.Explicit
26: 14321   1267344 byte[]
27: 37088   1186816 java.util.HashMap$Entry
28: 14300   915200 
 org.elasticsearch.index.mapper.object.ObjectMapper
29: 2180660520  java.lang.Object[]
30: 14349   573960 
 org.elasticsearch.common.collect.RegularImmutableMap
31: 16458   526656 
 org.elasticsearch.common.collect.RegularImmutableList
32: 14314   343536  org.apache.lucene.index.Term
33: 14314   343536  org.apache.lucene.util.BytesRef
34: 14293   343032 
 org.elasticsearch.common.collect.RegularImmutableMap$EntrySet
35: 14293   343032 
 org.elasticsearch.common.collect.RegularImmutableAsList
36: 14293   343032 
 org.elasticsearch.common.collect.ImmutableMapValues
37: 8   279936  java.util.HashMap$Entry[]
38: 14314   229024  java.lang.Object
39: 14314   229024 
 org.elasticsearch.common.lucene.search.TermFilter
40: 216451936   org.elasticsearch.index.mapper.ObjectMappers
41: 1   16400   java.lang.String[]
42: 119 8568   
 org.elasticsearch.index.mapper.core.StringFieldMapper
43: 1   8208   
 org.elasticsearch.common.jackson.core.sym.CharsToNameCanonicalizer$Bucket[]
44: 28  1120   
 org.elasticsearch.common.collect.SingletonImmutableBiMap
45: 14  728 org.elasticsearch.index.mapper.RootMapper[]
46: 7   728 
org.elasticsearch.index.mapper.DocumentMapper
47: 7   672 
org.elasticsearch.index.mapper.internal.TimestampFieldMapper
48: 28  672 
org.elasticsearch.common.collect.SingletonImmutableSet
49: 7   616 
org.elasticsearch.index.mapper.internal.TTLFieldMapper
50: 7   560 
org.elasticsearch.index.mapper.internal.SourceFieldMapper
51: 7   560 
org.elasticsearch.index.mapper.internal.SizeFieldMapper
52: 7   504 
org.elasticsearch.index.mapper.object.RootObjectMapper
53: 7   504 
org.elasticsearch.index.mapper.internal.BoostFieldMapper
54: 21  504 
org.elasticsearch.index.analysis.FieldNameAnalyzer
55: 14  448 
java.util.concurrent.locks.ReentrantLock$NonfairSync
56: 7   392 
org.elasticsearch.index.mapper.internal.UidFieldMapper
57: 7   392 
org.elasticsearch.index.mapper.internal.IdFieldMapper
58: 7   392 

Re: Order results by value in one of the array entries.

2014-01-08 Thread Johan E
I ended up changing the format of the json, with warehouse stock in 
separate entries in an array. This way I can check for it and get the stock 
at the same time.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6c2103ee-7786-4ec0-b981-10aabb365fb9%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: memory surges in client app when a node dies

2014-01-08 Thread nicolas . long
We're using the Java transport client.

The problem only happens when the app is dealing with a high number of 
requests. I wondered whether it was because the client takes a little bit 
of time to detect that the node is unavailable: potentially up to 10 
seconds in total (with default settings - 5 seconds to ping the node, 
another 5 for the timeout).

And perhaps even after the node has been dropped the existing connections 
to the node still need to timeout (not sure what the default is here)?

On Wednesday, 8 January 2014 13:19:29 UTC, Jason Wee wrote:

 It should not be possible right? If you configures client app to have two 
 or more elasticsearch nodes, it should detect if elasticsearch node is down 
 and not use it during indexing/querying.

 What client are you using?

 Jason


 On Wed, Jan 8, 2014 at 7:48 PM, nicola...@guardian.co.uk javascript:wrote:

 Hi all,

 I have a situation where if a node in our cluster dies (for whatever 
 reason) the client app experiences a surge in memory usage, full GCs, and 
 essentially dies.

 I think this is because the client holds on to the connections for a 
 whlie before realising the node is dead.

 Does this sound possible? And does anyone have tips for how to deal with 
 this. My thinking so far is:

 1. More memory

 2. A circuit-breaker pattern or some such to make sure the app 
 disconnects quicker when ES is not responding

 But are there ways to configure the ES client to improve the behaviour 
 here?

 Thanks,

 Nic

 -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/66c393a3-91d9-4314-a38f-e5267390b9b7%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/96062a71-107c-4a4c-80cf-ee676d963218%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: memory surges in client app when a node dies

2014-01-08 Thread nicolas . long
I think you probably replied just after mine!

We are using the transport client yes. And to clarify, ES itself is fine 
during these periods. It is the client app that has problems.

On Wednesday, 8 January 2014 13:34:29 UTC, Jörg Prante wrote:

 Have you tried TransportClient? TransportClient does not share the heap 
 memory with a cluster node. The setting client.transport.ping_timeout 
 checks if the nodes connected still respond. By default, it is 5 seconds, I 
 use values up to 30 seconds to survive long GCs without disconnects.

 Jörg



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1ee0c0fe-967d-4b2a-bdce-62173b255911%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Beta2 Java Client: java.nio.channels.UnresolvedAddressException

2014-01-08 Thread joergpra...@gmail.com
There are FQDNs like vcll36a-1001.equity.csfb.com which can not be resolved
by your DNS settings, it seems.

14 eth interfaces are quite cool to try to connect to, but I would reduce
them by the network interface eth alias names ES provides
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-network.html

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHj-Vq17bisdYFSmMDHzrih6EkiWQCOoD5JehWpim-N3w%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: memory surges in client app when a node dies

2014-01-08 Thread joergpra...@gmail.com
ES TransportClient uses a RetryListener which is a bit flaky in case of
exceptions caused by faulty nodes. Some users reported an explosion of port
use and connection retries, and this may also bring the client memory to a
limit. Maybe you have stack traces that show abnormal behavior so it's
worth to raise a github issue?

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHbhiwkCmkKdu_4x7f6S28pmC35detFcXyaDDg9Dkjrkg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


No hit using scan/scroll with has_parent filter

2014-01-08 Thread Jean-Baptiste Lièvremont
Hi folks,

I use a parent/child mapping configuration which works flawlessly with 
classic search requests, e.g using has_parent to find child documents 
with criteria on the parent documents.

I am trying to get all child document IDs that match a given set of 
criteria using scan and scroll, which also works well - until I introduce 
the has_parent filter, in which case the scroll request returns no hit 
(although total_hits is correct).

Is it a known issue?

I can provide sample mapping files and queries with associated/expected 
results. Please note that this behavior has been noticed on 0.90.6 but is 
still present in 0.90.9.

Thanks, best regards,
-- Jean-Baptiste Lièvremont

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fd7c563e-34f7-4aa8-ab1a-460840ba2af0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: More like this scoring algorithm unclear

2014-01-08 Thread Justin Treher
Hey Maarten,

I would use the explain:true option to see just why your documents are 
being scored higher than others. MoreLikeThis using the same fulltext 
scoring as far as I know, so term position would affect score. 

http://lucene.apache.org/core/3_0_3/api/contrib-queries/org/apache/lucene/search/similar/MoreLikeThis.html

Justin

On Wednesday, January 8, 2014 3:04:47 AM UTC-5, Maarten Roosendaal wrote:

 Hi,

 I have a question about why the 'more like this' algorithm scores 
 documents higher than others, while they are (at first glance) the same.

 What i've done is index wishlist-documents which contain 1 property: 
 product_id, this property contains an array of product_id's (e.g. [1234, 
 , , ]. What i'm trying to do is find similair wishlist for a 
 given wishlist with id x. The MLT API seems to work, it returns other 
 documents which contain at least 1 of the product_id's from the original 
 list.

 But what is see is that, for example. i get 10 hits, the first 6 hits 
 contain the same (and only 1) product_id, this product_id is present in the 
 original wishlist. What i would expect is that the score of the first 6 is 
 the same. However what i see is that only the first 2 have the same, the 
 next 2 a lower score and the next 2 even lower. Why is this?

 Also, i'm trying to write the MLT API as an MLT query, but somehow it 
 doesn't work. I would expect that i need to take the entire content of the 
 original product_id property and feed is as input for the 'like_text'. The 
 documentation is not very clear and doesn't provide examples so i'm a 
 little lost.

 Hope someone can give some pointers.

 Thanks,
 Maarten


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a0e9a58d-89e7-4084-b7ed-7f34c8514ce5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Searching indexed fields without analysing

2014-01-08 Thread Chris H
Hi.  I've deployed elasticsearch with logstash and kibana to take in 
Windows logs from my OSSEC log server, following this guide: 
http://vichargrave.com/ossec-log-management-with-elasticsearch/
I've tweaked the logstash config to extract some specific fields from the 
logs, such as User_Name.  I'm having some issues searching on these fields 
though.

These searches work as expected:

   - User_Name: * 
   - User_Name: john.smith
   - User_Name: john.*
   - NOT User_Name: john.*

But I'm having problems with Computer accounts, which take the format 
w-dc-01$ - they're being split on the - and the $ is ignored.  So a 
search for w-dc-01 returns all the servers named w-anything.  Also I 
can't do NOT User_Name: *$ to exclude computer accounts.

The mappings are created automatically by logstash, and GET 
/logstash-2014.01.08/_mapping shows:

User_Name: {

   type: multi_field,
   fields: {
  User_Name: {
 type: string,
 omit_norms: true
  },
  raw: {
 type: string,
 index: *not_analyzed*,
 omit_norms: true,
 index_options: docs,
 include_in_all: false,
 ignore_above: 256
  }
   }
},

My (limited) understanding is that the not_analyzed should stop the field 
being split, so that my searching matches the full name, but it doesn't.  
I'm trying both kibana and curl to get results.

Hope this makes sense.  I really like the look of elasticsearch, but being 
able to search on extracted fields like this is pretty key to me using it.

Thanks.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/62e3ebfc-aaa3-4af0-b93e-d4454146607b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: How to query custom rest handler in elastic search using Java api

2014-01-08 Thread Ivan Brusic
The CustomRestAction code you posted contains *exactly* the Java code you
need to execute the same action as the REST action.

If you want to still want to use the REST URL, you cannot use the
elasticsearch libraries. /_mastering/nodes is not a valid search type.
The action does not even execute a query technically, but retrieves node
leve information.

Cheers,

Ivan


On Wed, Jan 8, 2014 at 12:46 AM, Shishir Kumar shishir.su...@gmail.comwrote:

 Hi,

 I am not facing any issue with the NodesInfoAction or the custom endpoint
 code. The rest endpoint is working fine, if I curl to it using:curl -XGET '
 localhost:9200/_mastering/nodes?prettyhttp://10.114.24.132:9200/_mastering/nodes?pretty
 '.

 I am trying to find out a way to do this from an embedded node. In other
 words, somthing like below:

 Node node =
 NodeBuilder.nodeBuilder().clusterName(elasticsearch).node();
 Client client = node.client();
 SearchResponse response =
 client.prepareSearch().setSearchType(/_mastering/nodes).
 setQuery(QueryBuilders.queryString()).
 execute().actionGet();

 P.S. the code snippet doesn't actually work. But I want to query the
 /_mastering/nodes through Java api.


 On Friday, 3 January 2014 13:35:13 UTC+5:30, Jörg Prante wrote:

 You have wrapped a NodesInfoAction, so all you have to do is

 NodesInfoResponse response = client.admin().cluster().prepa
 reNodesInfo().all().execute().actionGet();

 That is the Java API.

 Jörg

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/bc517dbc-8dfc-4b8e-a9b3-567d87cc734c%40googlegroups.com
 .

 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD3Z-Ui8swVzkLoxASUiyuU9aWAeKpMkMkXfbN23vOGGQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Odd hot MVEL

2014-01-08 Thread Nikolas Everett
Does anyone know what might be causing MVEL to do this:
   100.3% (501.3ms out of 500ms) cpu usage by thread
'elasticsearch[elastic1002][search][T#23]'
 9/10 snapshots sharing following 47 elements
   java.lang.Throwable.fillInStackTrace(Native Method)
   java.lang.Throwable.fillInStackTrace(Throwable.java:782)
   java.lang.Throwable.init(Throwable.java:265)
   java.lang.Exception.init(Exception.java:66)
   java.lang.RuntimeException.init(RuntimeException.java:62)

java.lang.IllegalArgumentException.init(IllegalArgumentException.java:53)
   sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   java.lang.reflect.Method.invoke(Method.java:606)

org.elasticsearch.common.mvel2.optimizers.impl.refl.nodes.GetterAccessor.getValue(GetterAccessor.java:43)

org.elasticsearch.common.mvel2.optimizers.impl.refl.nodes.MapAccessorNest.getValue(MapAccessorNest.java:54)

org.elasticsearch.common.mvel2.optimizers.impl.refl.nodes.VariableAccessor.getValue(VariableAccessor.java:37)

org.elasticsearch.common.mvel2.ast.ASTNode.getReducedValueAccelerated(ASTNode.java:108)

org.elasticsearch.common.mvel2.MVELRuntime.execute(MVELRuntime.java:86)

org.elasticsearch.common.mvel2.compiler.CompiledExpression.getDirectValue(CompiledExpression.java:123)

org.elasticsearch.common.mvel2.compiler.CompiledExpression.getValue(CompiledExpression.java:119)

org.elasticsearch.common.mvel2.compiler.CompiledExpression.getValue(CompiledExpression.java:106)

org.elasticsearch.common.mvel2.ast.Substatement.getReducedValueAccelerated(Substatement.java:44)

org.elasticsearch.common.mvel2.ast.BinaryOperation.getReducedValueAccelerated(BinaryOperation.java:114)

org.elasticsearch.common.mvel2.ast.BinaryOperation.getReducedValueAccelerated(BinaryOperation.java:114)

org.elasticsearch.common.mvel2.compiler.ExecutableAccessor.getValue(ExecutableAccessor.java:42)

org.elasticsearch.common.mvel2.optimizers.impl.refl.nodes.MethodAccessor.executeAndCoerce(MethodAccessor.java:164)

org.elasticsearch.common.mvel2.optimizers.impl.refl.nodes.MethodAccessor.getValue(MethodAccessor.java:73)

org.elasticsearch.common.mvel2.ast.ASTNode.getReducedValueAccelerated(ASTNode.java:108)

org.elasticsearch.common.mvel2.MVELRuntime.execute(MVELRuntime.java:86)

org.elasticsearch.common.mvel2.compiler.CompiledExpression.getDirectValue(CompiledExpression.java:123)

org.elasticsearch.common.mvel2.compiler.CompiledExpression.getValue(CompiledExpression.java:119)

org.elasticsearch.script.mvel.MvelScriptEngineService$MvelSearchScript.run(MvelScriptEngineService.java:191)

org.elasticsearch.script.mvel.MvelScriptEngineService$MvelSearchScript.runAsDouble(MvelScriptEngineService.java:206)

org.elasticsearch.common.lucene.search.function.ScriptScoreFunction.score(ScriptScoreFunction.java:54)

It isn't an error.  Looking at MVEL's source it looks like it catches this
error and works around it by inspecting the function, casting the arguments
appropriately, and they retying.  I imagine it'd be nice and fast if I
didn't get the types wrong but it works anyway which feels a bit trappy at
scale.

I know this is caused by scoring tons of documents in a FunctionScore which
is a pretty strong argument for moving all FunctionScoring into a rescore
for protection but what in the world am I doing with MVEL to make it do
this?

My candidate MVEL looks like this:
log10( ($doc['a'].empty ? 0 : $doc['a']) + ($doc['b'].empty ? 0 :
$doc['b']) + 2 )


I'm trying to reproduce it with the debugger and Elasticsearch's tests but
I haven't had any luck yet so I'd love to hear if anyone else has seen this.

Nik

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1aX7vNR3L5ORHWROKvR6fMM6BkUNVSFVxKbpR8DwT4_g%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Upgrades causing Elastic Search downtime

2014-01-08 Thread Jenny Sivapalan
Thanks both for the replies. Our rebalance process doesn't take too long 
(~5 mins per node). I had some of the plugins (head, paramedic, bigdesk) 
open as I was closing down the old nodes and didn't see any split brain 
issue although I agree we can lead ourselves down this route by doubling 
the instances. We want our cluster to rebalance as we bring nodes in and 
out so disabling is not going to work for us unless I'm misunderstanding? 


On Tuesday, 7 January 2014 22:16:46 UTC, Mark Walkom wrote:

 You can also use cluster.routing.allocation.disable_allocation to reduce 
 the need of waiting for things to rebalance.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com
  

 On 8 January 2014 04:41, Ivan Brusic iv...@brusic.com javascript:wrote:

 Almost elasticsearch should support clusters of nodes with different 
 minor versions, I have seen issues between minor versions. Version 0.90.8 
 did contain an upgrade of Lucene (4.6), but that does not look like it 
 would cause your issue. You could look at the github issues tagged 
 0.90.[8-9] and see if something applies in your case.

 A couple of points about upgrading:

 If you want to use the double-the-nodes techniques (which should not be 
 necessary for minor version upgrades), you could decommission a node 
 using the Shard API. Here is a good writeup: 
 http://blog.sematext.com/2012/05/29/elasticsearch-shard-placement-control/

 Since you doubled the amount of nodes in the cluster, 
 the minimum_master_nodes setting would be temporarily incorrect and 
 potential split-brain clusters might occur. In fact, it might have occurred 
 in your case since the cluster state seems incorrect. Merely hypothesizing.

 Cheers,

 Ivan 


 On Tue, Jan 7, 2014 at 9:26 AM, Jenny Sivapalan 
 jennifer@gmail.comjavascript:
  wrote:

 Hello,

 We've upgraded Elastic Search twice over the last month and have 
 experienced downtime (roughly 8 minutes) during the roll out. I'm not sure 
 if it something we are doing wrong or not.

 We use EC2 instances for our Elastic Search cluster and cloud formation 
 to manage our stack. When we deploy a new version or change to Elastic 
 Search we upload the new artefact, double the number of EC2 instances and 
 wait for the new instances to join the cluster.

 For example 6 nodes form a cluster on v 0.90.7. We upload the 0.90.9 
 version via our deployment process and double the number nodes for the 
 cluster (12). The 6 new nodes will join the cluster with the 0.90.9 
 version. 

 We then want to remove each of the 0.90.7 nodes. We do this by shutting 
 down the node (using the plugin head), wait for the cluster to rebalance 
 the shards and then terminate the EC2 instances. Then repeat with the next 
 node. We leave the master node until last so that it does the re-election 
 just once.

 The issue we have found in the last two upgrades is that while the 
 penultimate node is shutting down the master starts throwing errors and the 
 cluster goes red. To fix this we've stopped the Elastic Search process on 
 master and have had to restart each of the other nodes (though perhaps they 
 would have rebalanced themselves in a longer time period?). We find 
 that we send an increase error response to our clients during this time.

 We've set out queue size for search to 300 and we start to see the queue 
 gets full:
at java.lang.Thread.run(Thread.java:724)
 2014-01-07 15:58:55,508 DEBUG action.search.type[Matt Murdock] 
 [92036651] Failed to execute fetch phase
 org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: 
 rejected execution (queue capacity 300) on 
 org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction$2@23f1bc3
 at 
 org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:61)
 at 
 java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)


 But also we see the following error which we've been unable to find the 
 diagnosis for:
  2014-01-07 15:58:55,530 DEBUG index.shard.service   [Matt Murdock] 
 [index-name][4] Can not build 'doc stats' from engine shard state 
 [RECOVERING]
 org.elasticsearch.index.shard.IllegalIndexShardStateException: 
 [index-name][4] CurrentState[RECOVERING] operations only allowed when 
 started/relocated
 at 
 org.elasticsearch.index.shard.service.InternalIndexShard.readAllowed(InternalIndexShard.java:765)

  Are we doing anything wrong or has anyone experienced this? 

 Thanks,
 Jenny

 -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 

Re: More like this scoring algorithm unclear

2014-01-08 Thread Maarten Roosendaal
Hi,

Thanks, i'm not quite sure how to do that. I'm using:
http://localhost:9200/lists/list/[id of 
list]/_mlt?mlt_field=product_idmin_term_freq=1min_doc_freq=1

the body does not seem to be respected (i'm using the elasticsearch head 
plugin) if i ad:
{
  explain: true
}

i've been trying to rewrite the mlt api as an mlt query but no luck so far. 
Any suggestions?

Thanks,
Maarten

Op woensdag 8 januari 2014 16:14:25 UTC+1 schreef Justin Treher:

 Hey Maarten,

 I would use the explain:true option to see just why your documents are 
 being scored higher than others. MoreLikeThis using the same fulltext 
 scoring as far as I know, so term position would affect score. 


 http://lucene.apache.org/core/3_0_3/api/contrib-queries/org/apache/lucene/search/similar/MoreLikeThis.html

 Justin

 On Wednesday, January 8, 2014 3:04:47 AM UTC-5, Maarten Roosendaal wrote:

 Hi,

 I have a question about why the 'more like this' algorithm scores 
 documents higher than others, while they are (at first glance) the same.

 What i've done is index wishlist-documents which contain 1 property: 
 product_id, this property contains an array of product_id's (e.g. [1234, 
 , , ]. What i'm trying to do is find similair wishlist for a 
 given wishlist with id x. The MLT API seems to work, it returns other 
 documents which contain at least 1 of the product_id's from the original 
 list.

 But what is see is that, for example. i get 10 hits, the first 6 hits 
 contain the same (and only 1) product_id, this product_id is present in the 
 original wishlist. What i would expect is that the score of the first 6 is 
 the same. However what i see is that only the first 2 have the same, the 
 next 2 a lower score and the next 2 even lower. Why is this?

 Also, i'm trying to write the MLT API as an MLT query, but somehow it 
 doesn't work. I would expect that i need to take the entire content of the 
 original product_id property and feed is as input for the 'like_text'. The 
 documentation is not very clear and doesn't provide examples so i'm a 
 little lost.

 Hope someone can give some pointers.

 Thanks,
 Maarten



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5f1b4a50-8862-42e8-a3a8-532f88757a48%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Timestamp and _timestamp

2014-01-08 Thread Николай Колев
Hi all,
I have a ES cluster with four nodes and 157 indexes. There are about 140 
mln. entries that occupy around 50 GB size (1 primary index with one 
replica). There are 2 data nodes, one pure master and one client node that 
serve as gate for web requests.
Last days I started to observe that the cluster becomes very unstable and 
every few hours one of the data server stop unexpectedly. The only solution 
was to reboot all data nodes to be able to process future logging.
My mapping contains this definitions:
Timestamp: {
type: date,
format: date_time
}
and 
_timestamp : { enabled : true, path : Timestamp  },

After some tests I discovered that if I do request with filtering on 
Timestamp the CPU load becomes very high and the cluster gets unstable. All 
incoming events are rejected.
While when I make requests filtering on _timestamp everything works well as 
expected.

My question is: why this is happening and what is the source of this 
behavior?
Any ideas how to fix it?

Thanks in advance,
Nickolay Kolev

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2a4cfa7b-fdfe-45a4-8bd2-906d8587a4f4%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Is it possible to do a text query against a (pre)defined set of fields?

2014-01-08 Thread InquiringMind
Ville,

Perhaps: Don't include the private fields in _all. Then a query against 
_all would be restricted to the (perhaps hundreds) of public fields.

A query that includes the private fields would need to list _all and then 
the private fields. But since you have only 2 or 3 private fields, there 
shouldn't be much overhead on the query.

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0692b19c-2c81-441b-ba2c-c7d33f98648c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Can wildcard, matched fields have relevance scoring?

2014-01-08 Thread project2501
Hi,
  I am doing an 'exists' query on a field that is matching to a text field. 
 The results all come back with same score.
Example:

_metadata:[* TO *]  // Match documents where this field exists

matched field ['text']

This searches only documents that contain a field called _metadata and 
highlights that field into 'text' field. 

I want the results to be ranked based on size of _metadata field or # of 
matches.

Is it possible?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e203d671-cca4-4d19-ba63-bc6d3c704026%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Is it possible to do a text query against a (pre)defined set of fields?

2014-01-08 Thread Ville Mattila
Hi,

Well - I think I've understood something wrong here. Isn't _all a special 
key that includes all indexed fields? Is there possibility to change the 
fields included in _all?

Ville


Ville,

 Perhaps: Don't include the private fields in _all. Then a query against 
 _all would be restricted to the (perhaps hundreds) of public fields.

 A query that includes the private fields would need to list _all and then 
 the private fields. But since you have only 2 or 3 private fields, there 
 shouldn't be much overhead on the query.

 Brian


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d326a968-9ed9-44e9-8640-9b00d9507c05%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Does the server support streaming?

2014-01-08 Thread joergpra...@gmail.com
You are correct, ES nodes consumes data request by request, before they are
passed on through the cluster. Also the bulk indexing requests, such
requests are temporarily pushed to buffers, but they are split by lines and
executed as single actions.

So to reduce network roundtrips, the best thing is to use the bulk API.
What is left is a few percent to optimize, which is not much worth it. With
gzip, ES HTTP provides transparent compression. Main challenge is HTTP
overhead (headers can't be compressed), and base64, if you use binary data
with ES.

Please note that you must evaluate the bulk responses too, in order to
validate the notification about bulk success on doc level.

It is possible to extend the whole ES API also to Websocket, so beside
JSON, it could also be possible to transfer JSON text frames or
SMILE/binary frames on a single bi-directional channel. HTTP must use two
channels for this, so with Websocket, you can reduce connection resources
to the half. In this sense, the Netty channel / REST / Java API could be
extended for special realtime WS streaming mode applications, like for
pubsub applications. I experimented with that some time ago on ES 0.20
https://github.com/jprante/elasticsearch-transport-websocket  (needs
updating)

From what I understand, the thrift transport plugin compiles the ES API,
operates in a streaming-like fashion, and is providing a solution that
reduces HTTP overhead:
https://github.com/elasticsearch/elasticsearch-transport-thrift

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH7wM%2BpdVpH9%3Dysoq7a0CesOGxDnY4yAwQAeAcqLWDGvQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: More like this scoring algorithm unclear

2014-01-08 Thread Maarten Roosendaal
scoring algorithm is still vague but i got the query to act like the API, 
although the results are different so i'm still doing it wrong, here's an 
example:
{
  explain: true,
  query: {
more_like_this: {
  fields: [
PRODUCT_ID
  ],
  like_text: 104004855475 1001004002067765 100200494210 
1002004004499883,
  min_term_freq: 1,
  min_doc_freq: 1,
  max_query_terms: 1,
  percent_terms_to_match: 0.5
}
  },
  from: 0,
  size: 50,
  sort: [],
  facets: {}
}

the like_text contains product_id's from a wishlist for which i want to 
find similair lists

Op woensdag 8 januari 2014 16:50:53 UTC+1 schreef Maarten Roosendaal:

 Hi,

 Thanks, i'm not quite sure how to do that. I'm using:
 http://localhost:9200/lists/list/[id of 
 list]/_mlt?mlt_field=product_idmin_term_freq=1min_doc_freq=1

 the body does not seem to be respected (i'm using the elasticsearch head 
 plugin) if i ad:
 {
   explain: true
 }

 i've been trying to rewrite the mlt api as an mlt query but no luck so 
 far. Any suggestions?

 Thanks,
 Maarten

 Op woensdag 8 januari 2014 16:14:25 UTC+1 schreef Justin Treher:

 Hey Maarten,

 I would use the explain:true option to see just why your documents are 
 being scored higher than others. MoreLikeThis using the same fulltext 
 scoring as far as I know, so term position would affect score. 


 http://lucene.apache.org/core/3_0_3/api/contrib-queries/org/apache/lucene/search/similar/MoreLikeThis.html

 Justin

 On Wednesday, January 8, 2014 3:04:47 AM UTC-5, Maarten Roosendaal wrote:

 Hi,

 I have a question about why the 'more like this' algorithm scores 
 documents higher than others, while they are (at first glance) the same.

 What i've done is index wishlist-documents which contain 1 property: 
 product_id, this property contains an array of product_id's (e.g. [1234, 
 , , ]. What i'm trying to do is find similair wishlist for a 
 given wishlist with id x. The MLT API seems to work, it returns other 
 documents which contain at least 1 of the product_id's from the original 
 list.

 But what is see is that, for example. i get 10 hits, the first 6 hits 
 contain the same (and only 1) product_id, this product_id is present in the 
 original wishlist. What i would expect is that the score of the first 6 is 
 the same. However what i see is that only the first 2 have the same, the 
 next 2 a lower score and the next 2 even lower. Why is this?

 Also, i'm trying to write the MLT API as an MLT query, but somehow it 
 doesn't work. I would expect that i need to take the entire content of the 
 original product_id property and feed is as input for the 'like_text'. The 
 documentation is not very clear and doesn't provide examples so i'm a 
 little lost.

 Hope someone can give some pointers.

 Thanks,
 Maarten



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c7032391-2456-47a0-a3b8-1f5fe61127e7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Elasticsearch Missing Data

2014-01-08 Thread Eric Luellen
Hello,

I've had my elasticsearch instance running for about a week with no issues, 
but last night it stopped working. When I went to look in Kibana, it stops 
logging around 20:45 on 1/7/14. I then restarted the service on both both 
elasticsearch servers and it started logging again and back pulled some 
logs from 07:10 that morning, even though I restarted the service around 
10:00. So my questions are:

1. Why did it stop working? I don't see any obvious errors.
2. When I restarted it, why didn't it go back and pull all of the data and 
not just some of it? I see that there are no unassigned shards.

curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
  cluster_name : my-elasticsearch,
  status : green,
  timed_out : false,
  number_of_nodes : 3,
  number_of_data_nodes : 2,
  active_primary_shards : 40,
  active_shards : 80,
  relocating_shards : 0,
  initializing_shards : 0,
  unassigned_shards : 0

Are there any additional queries or logs I can look at to see what is going 
on? 

On a slight side note, when I restarted my 2nd elasticsearch server it 
isn't reading from the /etc/elasticsearch.yml file like it should. It isn't 
creating the node name correctly or putting the data files in the spot I 
have configured. I'm using CentOS and doing everything via 
/etc/init.d/elasticsearch on both servers and the elasticsearch1 server 
reads everything correctly but elasticsearch2 does not.

Thanks for your help.
Eric

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fc191ee4-b312-4c52-89d9-de04c4309b65%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Strategy for keeping Elasticsearch updated with MySQL

2014-01-08 Thread David Pilato
I would do 1/ to have a more near real time search.
Also, I'd the idea that I have an object in memory and I simply push it to 
MySQL and to ES in the same time. No need to read again the object from MySQL 
to index it in another process (proposition 2)

That said you could use also a Message Queue in the middle if you want to be 
able at some point to stop your ES cluster without stopping your application.
This is what I did in the past.

My 2 cents

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 8 janvier 2014 at 20:13:40, arthurX (fc28...@gmail.com) a écrit:

Hello! I use MySQL as my primary datastore and use Elasticsearch to further 
index the documents.
My problem is keeping the data in ES in sync with MySQL.

Currently I have two methods in mind:
1. whenever add or update an entry in MySQL, do the action together in ES.
2. Do some cron jobs that periodically keep ES in sync with the data in MySQL.

For method 2 I wonder how can I check if an entry is already indexed in 
Elasticsearch. And would it be efficient at all if I have to check every entry 
to see if it is updated? 

I am new to the technology and I am afraid I had missed some really obvious and 
established solutions here. Or otherwise the normal way this situation is 
handled?
--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/55d842e5-277f-4d24-b5a9-8be5b5544dbc%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.52cdb688.70a64e2a.1449b%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.


Very open Elasticsearch installation

2014-01-08 Thread Nikolas Everett
I've spent the past six months or so writing and deploying a replacement on
site search system and we've finally decided it was time to blog about
ithttp://blog.wikimedia.org/2014/01/06/wikimedia-moving-to-elasticsearch/.
I figure this list might find this useful because
everythinghttp://git.wikimedia.org/summary/?r=mediawiki/extensions/CirrusSearch.gitis
open
sourcehttp://git.wikimedia.org/tree/operations%2Fpuppet.git/production/modules%2Felasticsearchand
publichttp://ganglia.wikimedia.org/latest/?c=Elasticsearch%20cluster%20eqiadm=cpu_reportr=hours=by%20namehc=4mc=2.
It looks like we're averaging about
3000http://ganglia.wikimedia.org/latest/?r=daycs=ce=m=es_queriess=by+namec=Elasticsearch+cluster+eqiadh=host_regex=max_graphs=0tab=mvn=hide-hf=falsesh=1z=smallhc=4queries
per second and about
300http://ganglia.wikimedia.org/latest/?r=daycs=ce=m=es_indexess=by+namec=Elasticsearch+cluster+eqiadh=host_regex=max_graphs=0tab=mvn=hide-hf=falsesh=1z=smallhc=4updates
per second at the moment which doesn't make us a very big
installation but we're excited in our own little way.  If you have time
please give it a shot here https://it.wikipedia.org/wiki/Speciale:Ricercaor
here https://en.wikisource.org/wiki/Special:Search or
herehttps://www.mediawiki.org/wiki/Special:Searchor, if you don't
mind somewhat uglier results,
here https://www.wikidata.org/wiki/Special:Search.  If you notice
anything fishy please let me know or file a
bughttps://bugzilla.wikimedia.org/enter_bug.cgi?product=MediaWiki%20extensionscomponent=CirrusSearch.
We're a pretty small team but we'll get to everything eventually.

Wish me/us luck.  Over the next few months we'll be doubling the number of
documents indexed and doubling the update rate and ramping up the query
rate by about an order of magnitude.

Thanks for reading,

Nik

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1RM7%2BZdJbUq%2B%2BAnTiz_4Hu1ei9OQ7c31w9%3Dgp248i-6A%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Very open Elasticsearch installation

2014-01-08 Thread David Pilato
This is really awesome Nik!
Congrats to your team.

I'm a bit disappointed that this search gives no result: 
https://en.wikisource.org/w/index.php?title=Special%3ASearchprofile=defaultsearch=elasticsearchfulltext=Search
 :-)

Best

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 8 janvier 2014 at 21:46:20, Nikolas Everett (nik9...@gmail.com) a écrit:

I've spent the past six months or so writing and deploying a replacement on 
site search system and we've finally decided it was time to blog about it.  I 
figure this list might find this useful because everything is open source and 
public.  It looks like we're averaging about 3000 queries per second and about 
300 updates per second at the moment which doesn't make us a very big 
installation but we're excited in our own little way.  If you have time please 
give it a shot here or here or here or, if you don't mind somewhat uglier 
results, here.  If you notice anything fishy please let me know or  file a bug. 
 We're a pretty small team but we'll get to everything eventually.

Wish me/us luck.  Over the next few months we'll be doubling the number of 
documents indexed and doubling the update rate and ramping up the query rate by 
about an order of magnitude.

Thanks for reading,

Nik
--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1RM7%2BZdJbUq%2B%2BAnTiz_4Hu1ei9OQ7c31w9%3Dgp248i-6A%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.52cdb9ed.4b588f54.1449b%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Strategy for keeping Elasticsearch updated with MySQL

2014-01-08 Thread Николай Колев
Hi Arthur,

I have done something similar years ago when I was working for a newspaper.
We kept articles in database and full text was done with external program. 
There was a trigger the tables with articles that on every change operation 
adds record in a queue table. Something like this:
article_id, opetation_type, table_name
Then there was a cron jon every minute that reads from this table and
-On delete deletes the entry
-On Update deletes the entry and generates new simple page with the new 
artice – only title and content and put it on indexer to be indexed
-On insert  generates new simple page with the new artice – only title 
and content and put it on indexer to be indexed

Articles have been placed in some directory like this: 
/root_dir/table_name/id/content.html. Then this path was returned 
and easy parsed to generate appropriate link to the artice.

After success removes respective record from the queue and we have near 
realtime 

This can be done with ES but much easier 

best reragards,
Nickolay Kolev

08 януари 2014, сряда, 21:13:35 UTC+2, arthurX написа:

 Hello! I use MySQL as my primary datastore and use Elasticsearch to 
 further index the documents.
 My problem is keeping the data in ES in sync with MySQL.

 Currently I have two methods in mind:
 1. whenever add or update an entry in MySQL, do the action together in ES.
 2. Do some cron jobs that periodically keep ES in sync with the data in 
 MySQL.

 For method 2 I wonder how can I check if an entry is already indexed in 
 Elasticsearch. And would it be efficient at all if I have to check every 
 entry to see if it is updated? 

 I am new to the technology and I am afraid I had missed some really 
 obvious and established solutions here. Or otherwise the normal way this 
 situation is handled?


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7dff496c-cc26-4620-bf1b-115a53f0ca6d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Very open Elasticsearch installation

2014-01-08 Thread Nikolas Everett
On Wed, Jan 8, 2014 at 3:49 PM, David Pilato da...@pilato.fr wrote:

 This is really awesome Nik!
 Congrats to your team.

 I'm a bit disappointed that this search gives no result:
 https://en.wikisource.org/w/index.php?title=Special%3ASearchprofile=defaultsearch=elasticsearchfulltext=Search
  :-)


Looks like we don't have any books about Elasticsearch.  It does show up
here:
https://it.wikipedia.org/w/index.php?title=Speciale%3ARicercaprofile=defaultsearch=elasticsearchfulltext=Searchbut
I can't read it.  You can also find technical stuff about the
integration and our rollout plan over here:
https://www.mediawiki.org/w/index.php?title=Special%3ASearchprofile=defaultsearch=elasticsearchfulltext=Search.

I'll let you know when you can find
https://en.wikipedia.org/wiki/Elasticsearch with it but that might take
some time.  There is way too much search traffic for us to be the default
there.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1Eaxe0dYW-WjyYbhVssaambHWdYqKeyMFstah3apdDrw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Is it possible to do a text query against a (pre)defined set of fields?

2014-01-08 Thread InquiringMind
Ville,

By default, the _all field includes all of the indexed field. Then, for 
your private fields, explicitly exclude them from the _all field by adding 
the following to their properties:

include_in_all : false


See the ES guide for more details, Specifically, 
thishttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-all-field.htmlmight
 help.

I typically disable the _all field completely to cut down dramatically on 
disk space and build times. But everything else in ES has worked like a 
charm, so I'm sure this would work for you without too much trouble. Good 
luck!

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/54d7375c-2644-4b81-8e0e-51e74140f0aa%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: incrementally scaling ES from the small data

2014-01-08 Thread InquiringMind
Adolfo,

Still could not test how sockets relate to shards and why I automatically 
 get 10 established sockets when opening a client:

 node = builder.client(clientOnly).data(!clientOnly).local(local).node();

 client = node.client();


 on default ES configuration, and many many more sockets after (up to 200), 
 and how this number changes when increasing/decreasing number of shards, 


Of course, your application should create only one client and then let all 
threads within the application share that one client. Each client, 
especially the NodeClient, typically creates a thread pool behind it. It's 
a very heavy-weight object, so do not create more than one of them. But 
it's perfectly thread-safe and can (should) be used by as many threads in 
your application as desired.

Brian 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/653b907c-38fe-4cfd-9cb9-1e7dcfae9c00%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Kibana Static Dashboard ?

2014-01-08 Thread Jay Wilson
 I am modifying the guided.json dashboard. Down in Events panel I would 
like to tell kibana to statically filter out specific records. I tried 
adding this to the file.

  query: {
  filtered: {
  query: {
bool: {
  should: [
  {
  query_string: {
   query: record-type: traffic-stats
  }
  }
]
 }
 }
  }
  },

Doesn't appear to work.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/83bd80b2-5a61-4a15-b359-125fd600f3cd%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Very open Elasticsearch installation

2014-01-08 Thread joergpra...@gmail.com
I can't tell it in other words, but your step to ES is a landmark.

Thank you, Nik, for making this public, this helps me a lot for spreading
the word for more openness...

https://en.wikisource.org/w/index.php?title=Special%3ASearchprofile=defaultsearch=Hello+Worldfulltext=Search

The search suggestion is a bit surprising - he do works :) but what a
difference to the old search https://de.wikisource.org/wiki/Spezial:Suche

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHKtgyS%2BKPuDNnhXbx6nc038O61TGze-mC6pVvTANMkGA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Kibana Static Dashboard ?

2014-01-08 Thread vineeth mohan
Hello Jay ,

Cant you do the same from the kibana side by adding a must not filter.
Here once you save that dashboard , you can always go back to the same link
to see the same static dashboard.

Thanks
 Vineeth


On Thu, Jan 9, 2014 at 2:42 AM, Jay Wilson jawro...@gmail.com wrote:

  I am modifying the guided.json dashboard. Down in Events panel I would
 like to tell kibana to statically filter out specific records. I tried
 adding this to the file.

   query: {
   filtered: {
   query: {
 bool: {
   should: [
   {
   query_string: {
query: record-type: traffic-stats
   }
   }
 ]
  }
  }
   }
   },

 Doesn't appear to work.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/83bd80b2-5a61-4a15-b359-125fd600f3cd%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGdPd5n%2BXkPc94dk_jmTOFFK7%2B_nrUdPL__k8d2ua6zcAe2Arg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Kibana Static Dashboard ?

2014-01-08 Thread Jay Wilson
As I understand Kibana when a dashboard is saved, it is placed into 
elasticsearch. I don't want it in elasticsearch. I want it in a static file.



On Wednesday, January 8, 2014 2:32:50 PM UTC-7, vineeth mohan wrote:

 Hello Jay , 

 Cant you do the same from the kibana side by adding a must not filter.
 Here once you save that dashboard , you can always go back to the same 
 link to see the same static dashboard.

 Thanks
  Vineeth


 On Thu, Jan 9, 2014 at 2:42 AM, Jay Wilson jawr...@gmail.comjavascript:
  wrote:

  I am modifying the guided.json dashboard. Down in Events panel I would 
 like to tell kibana to statically filter out specific records. I tried 
 adding this to the file.

   query: {
   filtered: {
   query: {
 bool: {
   should: [
   {
   query_string: {
query: record-type: traffic-stats
   }
   }
 ]
  }
  }
   }
   },

 Doesn't appear to work.


  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/83bd80b2-5a61-4a15-b359-125fd600f3cd%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f207abdd-9fce-4379-aa9a-dd1dd35aa398%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


allow_explicit_index and _bulk

2014-01-08 Thread Gabe Gorelick-Feldman
The documentation on URL-based access 
controlhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/url-access-control.html
 implies 
that _bulk still works if you set rest.action.multi.allow_explicit_index: 
false, as long as you specify the index in the URL. However, I can't get it 
to work.

POST /foo/bar/_bulk
{ index: {} }
{ _id: 1234, baz: foobar }

returns 

explicit index in bulk is not allowed

Should this work?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a0d1fa2f-0c28-4142-9f6d-4b28a1695bb3%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: incrementally scaling ES from the small data

2014-01-08 Thread Ivan Brusic
BTW, I was very wrong when I mentioned that elasticsearch uses consistent
hashing. It uses modulo-based hashing, which is why the number of shards
cannot change since the modulo is fixed. Working on too many things at once
while replying. :)


On Wed, Jan 8, 2014 at 1:10 PM, InquiringMind brian.from...@gmail.comwrote:

 Adolfo,


 Still could not test how sockets relate to shards and why I automatically
 get 10 established sockets when opening a client:

 node = builder.client(clientOnly).data(!clientOnly).local(local).node();

 client = node.client();


 on default ES configuration, and many many more sockets after (up to
 200), and how this number changes when increasing/decreasing number of
 shards,


 Of course, your application should create only one client and then let all
 threads within the application share that one client. Each client,
 especially the NodeClient, typically creates a thread pool behind it. It's
 a very heavy-weight object, so do not create more than one of them. But
 it's perfectly thread-safe and can (should) be used by as many threads in
 your application as desired.

 Brian

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/653b907c-38fe-4cfd-9cb9-1e7dcfae9c00%40googlegroups.com
 .

 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDjLig-7%3D-_Ha%3D9Mi36um_qjqdJjMj-Ju7scg%3DKpxjpFA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: No hit using scan/scroll with has_parent filter

2014-01-08 Thread Martijn v Groningen
Hi Jean,

Can you share how you execute the scan request with the has_parent filter?
(via a gist or something like that)

Martijn


On 8 January 2014 15:17, Jean-Baptiste Lièvremont 
jean-baptiste.lievrem...@sonarsource.com wrote:

 Hi folks,

 I use a parent/child mapping configuration which works flawlessly with
 classic search requests, e.g using has_parent to find child documents
 with criteria on the parent documents.

 I am trying to get all child document IDs that match a given set of
 criteria using scan and scroll, which also works well - until I introduce
 the has_parent filter, in which case the scroll request returns no hit
 (although total_hits is correct).

 Is it a known issue?

 I can provide sample mapping files and queries with associated/expected
 results. Please note that this behavior has been noticed on 0.90.6 but is
 still present in 0.90.9.

 Thanks, best regards,
 -- Jean-Baptiste Lièvremont

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/fd7c563e-34f7-4aa8-ab1a-460840ba2af0%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.




-- 
Met vriendelijke groet,

Martijn van Groningen

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CA%2BA76TzaCJqG26SMNid2PawALjNSZgr%2ByS3NJEFfF3G%3DySjqkw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


SSL and org.elasticsearch.transport.NodeDisconnectedException

2014-01-08 Thread Maciej Stoszko


I have an es_client (java/dropwizard) application. It communicates with the 
elasticsearch just fine over plaintext connection. 

I have followed the instructions at 
https://github.com/sonian/elasticsearch-jetty to set up SSL for es. 

However when I start my es_client it reports every 5 seconds the following:

INFO [2014-01-08 23:02:14,814] org.elasticsearch.client.transport: 
[Karolina Dean] failed to get node info for 
[#transport#-1][inet[localhost/127.0.0.1:9443]], disconnecting... ! 
org.elasticsearch.transport.NodeDisconnectedException: 
[][inet[localhost/127.0.0.1:9443]][cluster/nodes/info] disconnected

How can I go about figuring this one out?

Thanks, 

Maciej

 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bff6a325-ab42-48ef-a6f4-bc7f9c274d1a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: SSL and org.elasticsearch.transport.NodeDisconnectedException

2014-01-08 Thread Maciej Stoszko


On Wednesday, January 8, 2014 5:19:10 PM UTC-6, Maciej Stoszko wrote:

 I have an es_client (java/dropwizard) application. It communicates with 
 the elasticsearch just fine over plaintext connection. 

 I have followed the instructions at 
 https://github.com/sonian/elasticsearch-jetty to set up SSL for es. 

 However when I start my es_client it reports every 5 seconds the following:

 INFO [2014-01-08 23:02:14,814] org.elasticsearch.client.transport: 
 [Karolina Dean] failed to get node info for 
 [#transport#-1][inet[localhost/127.0.0.1:9443]], disconnecting... ! 
 org.elasticsearch.transport.NodeDisconnectedException: 
 [][inet[localhost/127.0.0.1:9443]][cluster/nodes/info] disconnected

 How can I go about figuring this one out?

 Thanks, 

 Maciej

 Actually digging around a bit more, I think I should revise my question:
Is it currently possible to have JAVA API client talking to Elasticsearch 
via SSL.
I see that https://github.com/elasticsearch/elasticsearch/pull/2105 (Add 
SSL support to Netty transport layer for Client/Node-to-Node communication) 
was rejected.
Maybe it is simply a feature which does not (yet) exist. 

  


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3933c63d-9763-4892-977b-733c0407e140%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: How to index an existing json file

2014-01-08 Thread ZenMaster80
Thank you for the binary flag tip. It is also in the documentation here:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html

On Tuesday, January 7, 2014 9:00:33 PM UTC-5, ZenMaster80 wrote:

 Hi,

 I am just starting with ElasticSearch, I would like to know how to index a 
 simple json document books.json that has the following in it: Where do I 
 place the document? I placed it in root directory of elastic search and in 
 /bin folder..

 {“books”:[{“name”:”life in heaven”,”author”:”Mike Smith”},{“name”:”get 
 rich”,”author”:”Joe Shmoe”},{“name”:”luxury properties”,”author”:”Linda 
 Jones”]}}


 $ curl -XPUT http://localhost:9200/books/book/1; -d @books.json

 Warning: Couldn't read data from file books.json, this makes an empty 
 POST.

 {error:MapperParsingException[failed to parse, document is 
 empty],status:400}


 Thanks


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/853a876f-c6cb-4dd5-907a-13f626b3f078%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Filter and Query same taking some time

2014-01-08 Thread Arjit Gupta
Hi, 

I had implemented ES search query  for all our use cases but when i came to 
know that some of our use cases can be solved by filters I implemented that 
but I dont see any gain (in response time) in filters. My search queries 
 are 

1. Filter 

{
  size : 100,
  query : {
match_all : { }
  },
  filter : {
bool : {
  must : {
term : {
  color : red
}
  }
}
  },
  version : true
}


2. Query 

{
  size : 100,
  query : {
bool : {
  must : {
match : {
  color : {
query : red,
type : boolean,
operator : AND
  }
}
  }
}
  },
  version : true
}

By default the term query should be cached but I dont see a performance 
gain. 
Do i need to change some parameter also  ?
I am using ES  0.90.1 and with 16Gb of heap space given to ES. 

Thanks,
Arjit

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/326a6640-d887-46b4-a8e7-ec15a1c9dc98%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Why not use rivers in production?

2014-01-08 Thread Warner Onstine
Getting reintroduced to ES and a co-worker recommended I listen to the
webinar intro that Drew Raines gave as he mentioned something specific
about rivers.

Listening through it I heard him say that they (assuming ES) don't
recommend using rivers in production because it's tied to one node.

Having looked through a lot of the documentation on rivers I do see that
you can specify which rivers run on which nodes so I wasn't sure what the
exact implication was of this statement.

Drew? Or anyone else care to comment?

We're getting ready to push all of our data from MongoDB into ES so that we
can search it and use Kibana for analysis so any insight into this would be
great, thank you :).

-warner

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJNTuMAay4fSD%3DaE453Ps2pYFRXcYjeM2TUb718z9na8pYqdBg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: High load average running on ES node

2014-01-08 Thread Arjit Gupta
Hi Jörg,

Thanks a lot for your detailed reply.
Can you please explain how can I  *reconfiguring ES for efficient cache
usage* ?

Thanks,
Arjit

Thanks ,
Arjit


On Sun, Jan 5, 2014 at 10:55 PM, joergpra...@gmail.com 
joergpra...@gmail.com wrote:

 The load is not much a surprise for an 8 core CPU node, I have also
 observed loads of 80-100.

 This high load, when induced by indexing, can be significantly reduced
 when using a high performance input/output disk subsystem, such as SSD. The
 disks are the slowest part in the system and generate high I/O wait which
 is responsible for increasing the CPU load.

 GC does generate high load too, this is mostly related to expensive
 queries that use filters or caches. The overall performance of the JVM is
 getting very poor in that case.

 You have several options:
 - rewriting queries or reconfiguring ES for efficient cache usage
 - adding nodes
 - decrease the heap slightly to smooth the steep edge when stop-the-world
 GC kicks in (but this depends on the workload if your ES cluster can work
 with less heap)

 G1 GC does not help against query/filter load, it is not decreasing CPU
 load, in fact, it is putting more CPU load on the machines, so it can
 better make a trade-off with less stop-of-the-world. G1 GC helps to push
 the stop-the-world periods under a certain limit so ES nodes do not
 disconnect that easily. It has no steep edge when performing stop-the-world
 GC phases.

 Please note, currently G1 GC seems safe only with Java 7 or Java 8 and ES
 version that have replaced GNU trove4j with HPPC library, that is, 0.90.9
 or 1.0.0.Beta2

 Jörg

  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/taLTdd4S29w/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFnK4SweOfQK1b0yg9M4yCBJVVmGpd%3DcJpHSwqLC9Cjxw%40mail.gmail.com
 .

 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CADe%2BHd_1c2OpJoHdH8WGa0q0F-FR4haC71FrmPGgk5Oz1RwDPg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks

2014-01-08 Thread xjj210130
 The env is following:
 --elasticseasrch  v0.90(  i use 0.90.9 , the problem is still exist).
 -- java version is 1.7.0_45

On Wednesday, January 8, 2014 6:58:02 PM UTC+8, xjj2...@gmail.com wrote:

 Dear all:
I insert 1 logs to elasticsearch, each log is about 2M, and 
 there are about 3000 keys and values.
  when i insert about 2, it used about 30G memory, and then 
 elasticsearch is very slow, and it's hard to insert log.
  Could someone help me how to solve it? Thanks very much.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/caec9b84-c543-4bb3-8cb0-e90113972716%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks

2014-01-08 Thread David Pilato
Just wondering if you are hitting the same RAM usage when inserting without 
thrift?
Could you test it?

Could you gist as well what gives: 

curl -XGET 'http://localhost:9200/_nodes?all=truepretty=true'


-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 9 janvier 2014 at 07:11:33, xjj210...@gmail.com (xjj210...@gmail.com) a 
écrit:

 The env is following:
     --elasticseasrch  v0.90(  i use 0.90.9 , the problem is still exist).
     -- java version is 1.7.0_45

On Wednesday, January 8, 2014 6:58:02 PM UTC+8, xjj2...@gmail.com wrote:
Dear all:
       I insert 1 logs to elasticsearch, each log is about 2M, and there 
are about 3000 keys and values.
 when i insert about 2, it used about 30G memory, and then elasticsearch is 
very slow, and it's hard to insert log.
 Could someone help me how to solve it? Thanks very much.
--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/caec9b84-c543-4bb3-8cb0-e90113972716%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.52ce4caf.7644a45c.1449b%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Why not use rivers in production?

2014-01-08 Thread David Pilato
A river instance is a singleton in the cluster.
It means that a river is working only on a single node.

It could be reallocated on another node when the first node fails.

I think that's what Drew meant. Basically, rivers does not scale.

My 2 cents

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 9 janvier 2014 at 06:15:24, Warner Onstine (warn...@gmail.com) a écrit:

Getting reintroduced to ES and a co-worker recommended I listen to the webinar 
intro that Drew Raines gave as he mentioned something specific about rivers.

Listening through it I heard him say that they (assuming ES) don't recommend 
using rivers in production because it's tied to one node.

Having looked through a lot of the documentation on rivers I do see that you 
can specify which rivers run on which nodes so I wasn't sure what the exact 
implication was of this statement.

Drew? Or anyone else care to comment?

We're getting ready to push all of our data from MongoDB into ES so that we can 
search it and use Kibana for analysis so any insight into this would be great, 
thank you :).

-warner
--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJNTuMAay4fSD%3DaE453Ps2pYFRXcYjeM2TUb718z9na8pYqdBg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.52ce4d53.749abb43.1449b%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Filter and Query same taking some time

2014-01-08 Thread David Pilato
You probably won't see any difference the first time you execute it unless you 
are using warmers.
With a second query, you should see the difference.

How many documents you have in your dataset?

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 9 janvier 2014 at 06:14:06, Arjit Gupta (arjit...@gmail.com) a écrit:

Hi, 

I had implemented ES search query  for all our use cases but when i came to 
know that some of our use cases can be solved by filters I implemented that but 
I dont see any gain (in response time) in filters. My search queries  are 

1. Filter 

{
  size : 100,
  query : {
    match_all : { }
  },
  filter : {
    bool : {
      must : {
        term : {
          color : red
        }
      }
    }
  },
  version : true
}


2. Query 

{
  size : 100,
  query : {
    bool : {
      must : {
        match : {
          color : {
            query : red,
            type : boolean,
            operator : AND
          }
        }
      }
    }
  },
  version : true
}

By default the term query should be cached but I dont see a performance gain. 
Do i need to change some parameter also  ?
I am using ES  0.90.1 and with 16Gb of heap space given to ES. 

Thanks,
Arjit
--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/326a6640-d887-46b4-a8e7-ec15a1c9dc98%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.52ce519b.75c6c33a.1449b%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Filter and Query same taking some time

2014-01-08 Thread Arjit Gupta
I have 100,000 documents  which are similar. In response I am getting the
whole document not just Id.
I am executing the query multiple times.

Thanks ,
Arjit


On Thu, Jan 9, 2014 at 1:06 PM, David Pilato da...@pilato.fr wrote:

 You probably won't see any difference the first time you execute it unless
 you are using warmers.
 With a second query, you should see the difference.

 How many documents you have in your dataset?

 --
 *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
 @dadoonet https://twitter.com/dadoonet | 
 @elasticsearchfrhttps://twitter.com/elasticsearchfr


 Le 9 janvier 2014 at 06:14:06, Arjit Gupta 
 (arjit...@gmail.com//arjit...@gmail.com)
 a écrit:

 Hi,

 I had implemented ES search query  for all our use cases but when i came
 to know that some of our use cases can be solved by filters I implemented
 that but I dont see any gain (in response time) in filters. My search
 queries  are

 1. Filter

 {
   size : 100,
   query : {
 match_all : { }
   },
   filter : {
 bool : {
   must : {
 term : {
   color : red
 }
   }
 }
   },
   version : true
 }


 2. Query

 {
   size : 100,
   query : {
 bool : {
   must : {
 match : {
   color : {
 query : red,
 type : boolean,
 operator : AND
   }
 }
   }
 }
   },
   version : true
 }

 By default the term query should be cached but I dont see a performance
 gain.
 Do i need to change some parameter also  ?
 I am using ES  0.90.1 and with 16Gb of heap space given to ES.

 Thanks,
 Arjit
  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.

 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/326a6640-d887-46b4-a8e7-ec15a1c9dc98%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.

  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/uknnBHMnZLk/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/etPan.52ce519b.75c6c33a.1449b%40MacBook-Air-de-David.local
 .

 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CADe%2BHd-RzJxTrtt8gVOS6cxa%3DXNZ%3Dwa%2Bv8Vnwnqigd5gfnJ0fw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: SSL and org.elasticsearch.transport.NodeDisconnectedException

2014-01-08 Thread David Pilato
jetty plugin replace http layer (9200) not the transport layer (9300).
Transport Client uses transport layer (9300).


-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 9 janvier 2014 at 02:02:48, Maciej Stoszko (maciek...@gmail.com) a écrit:



On Wednesday, January 8, 2014 5:19:10 PM UTC-6, Maciej Stoszko wrote:
I have an es_client (java/dropwizard) application. It communicates with the 
elasticsearch just fine over plaintext connection. 

I have followed the instructions at 
https://github.com/sonian/elasticsearch-jetty to set up SSL for es. 

However when I start my es_client it reports every 5 seconds the following:

INFO [2014-01-08 23:02:14,814] org.elasticsearch.client.transport: [Karolina 
Dean] failed to get node info for 
[#transport#-1][inet[localhost/127.0.0.1:9443]], disconnecting... ! 
org.elasticsearch.transport.NodeDisconnectedException: 
[][inet[localhost/127.0.0.1:9443]][cluster/nodes/info] disconnected

How can I go about figuring this one out?

Thanks, 

Maciej


Actually digging around a bit more, I think I should revise my question:
Is it currently possible to have JAVA API client talking to Elasticsearch via 
SSL.
I see that https://github.com/elasticsearch/elasticsearch/pull/2105 (Add SSL 
support to Netty transport layer for Client/Node-to-Node communication) was 
rejected.
Maybe it is simply a feature which does not (yet) exist. 
 
--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3933c63d-9763-4892-977b-733c0407e140%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.52ce5336.374a3fe6.1449b%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.


Elasticsearch Hadoop

2014-01-08 Thread Badal Mohapatra
Hi,

   To index Hadoop data into elasticsearch as I understand,
We create an external table with essstorage handler and then copy the data 
from another internal hive table doesn't it duplicate the data in HDFS?
Is there any way to use the hive internal tables directly to index instead 
of having two tables with same data?

Kind Regards,
Badal

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ed08fd38-05e4-437a-a8e2-3295f2195e2a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Converting queries returning certain distinct records to ES

2014-01-08 Thread David Pilato
May be you could find a way to do that with a single query if you design your 
documents in another way?
Or using facets for the first query and Ids filter for the second?
It's hard to tell without a concrete example of JSON documents.

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 9 janvier 2014 at 01:28:06, heat...@hodgetastic.com 
(heat...@hodgetastic.com) a écrit:

Hello

I am currently trying to migrate an sql application to Elasticsearch. 

I need to be able to select a collection of results from an index which, for 
given search conditions, have distinct pairings of two certain columns. In sql 
I do the following two queries:

Query 1:

SELECT column_A, column_B, GROUP_CONCAT (table_name..id) id FROM `table_name` 
WHERE `column_?` = 'something' GROUP BY column_A, column_B, column_?
Query 2:

 SELECT `table_name`.* FROM `table_name `  WHERE  `column_?` = 'something'  
AND (`table_name.id` IN (ids_from_previous_query))
The first query returns me a list of ids from table_name such that each id 
satisfies the condition `column_?` = 'something' and the record with that id 
has a distinct [column_A,column_B]

The second query then returns me all the records satisfying  `column_?` = 
'something' but only from that range of ids (I realise I probably do not need 
to do `column_?` = 'something again in the second query.)

The result is that each record returned by the second query has satisfies the 
condition  `column_?` = 'something'  and I am only returned one record for 
each [column_A,column_B] paring.

Since there is not really a 'distinct' option yet I am having trouble finding a 
way replicate this output with ES and wondered if anyone might have any 
thoughts as how I might go about it?

At the moment I am open to any mapping / query combinations that will achieve 
what I need.

--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6a857778-0399-4b3c-9973-a3e353436311%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.52ce5655.354fe9f9.1449b%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.