date:20140108

Have a look at 
https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L376

You will see that mapper attachment reads:

Metadata.DATE
Metadata.TITLE
Metadata.AUTHOR
Metadata.KEYWORDS
Metadata.CONTENT_TYPE
Metadata.CONTENT_LENGTH

Does it help?

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 8 janvier 2014 at 05:05:10, HongXuan Ji (hxua...@gmail.com) a écrit:

Hi all,

I am wondering how many metadata fields of MP3 files exist when I post the mp3 
file into ElasticSearch using the mapper-attachment. 

Because in Solr we can know the field information through the endpoint 
SOLR_HOST/update/extract?extractOnly=true, 

but in ElasticSearch are there any ways to get such informations?  Except for 
the MP3 files, how about the doc files? 

I know the ElasticSearch use tika to support this operations, can you give me 
some example to fetch some special field of some special file format?

Regards,

Ivan 


--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/742f86b9-9dd8-4354-ae50-26332f0c4dc0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.52cd0d7f.2eb141f2.1449b%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.

Re: ElasticsearchHadoop Hive integration issue

2014-01-08 Thread Badal Mohapatra

Hi Costin,

Thanks for your kind reply.
After specifying the type in es.resource I am now able to index.

I am using M1, will try with master once indexing is done.

Regards,
Badal

On Tuesday, 7 January 2014 16:21:01 UTC+5:30, Costin Leau wrote:

Hi,

The 'es.resource' you specified is incorrect - you need to specify both an
index and a type - e.g. myIndex/products

P.S. Are you using M1 or the current master - the latter should give a
proper error (and message).

Thanks,

On 07/01/2014 9:48 AM, Badal Mohapatra wrote:
Hi,

I am trying to index data from hive table to elasticsearch and and
using the latest elasticsearch-hadoop-master plugin.
My elasticsearch version is 0.90.9 and hive version is hive-0.11.0.

As per the documentation of elasticsearch-hadoop plugin (hive
integration), I successfully created an external table
with the below command

/CREATE EXTERNAL TABLE es_products (
sku int,rating float,
name string,
type string,
saleprice float,
department string,
manufacturer string,
userid string,
category_name string,
query string)
STORED BY 'org.elasticsearch.hadoop.hive.ESStorageHandler'
TBLPROPERTIES('es.resource' ='products');/

Even though the external table is created
I am not able to either insert data or even query the external table.
When I do a /select * from es_products;/
I get the below exception.

hive select * from es_products;
OK
Failed with exception
java.io.IOException:java.lang.StringIndexOutOfBoundsException: String index
out of range: -1
Time taken: 1.699 seconds

Can someone please suggest what / where I am wrong!

Kind Regards,
Badal

https://groups.google.com/d/msgid/elasticsearch/dd63310c-dc07-4dc6-9354-69051a05da3f%40googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
--
Costin

Re: How to query custom rest handler in elastic search using Java api

2014-01-08 Thread Shishir Kumar

Hi,

I am not facing any issue with the NodesInfoAction or the custom endpoint 
code. The rest endpoint is working fine, if I curl to it using:curl -XGET '
localhost:9200/_mastering/nodes?prettyhttp://10.114.24.132:9200/_mastering/nodes?pretty
'.

I am trying to find out a way to do this from an embedded node. In other 
words, somthing like below:

Node node = 
NodeBuilder.nodeBuilder().clusterName(elasticsearch).node();
Client client = node.client();
SearchResponse response = 
client.prepareSearch().setSearchType(/_mastering/nodes).
setQuery(QueryBuilders.queryString()).
execute().actionGet();

P.S. the code snippet doesn't actually work. But I want to query the 
/_mastering/nodes through Java api.

On Friday, 3 January 2014 13:35:13 UTC+5:30, Jörg Prante wrote:

 You have wrapped a NodesInfoAction, so all you have to do is

 NodesInfoResponse response = client.admin().cluster().
 prepareNodesInfo().all().execute().actionGet();

 That is the Java API.

 Jörg



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bc517dbc-8dfc-4b8e-a9b3-567d87cc734c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Replicating one cluster to another cluster

First and most important, the good news: ES 1.0.0.Beta2 has
snapshot/restore feature in place so it should be easy to snapshot and
restore the result back to a target cluster. The snapshots are also
incremental.

Second, there are also news for the knapsack plugin.

In the next knapsack plugin version due this week, a full copy from
cluster1 to cluster2 will be as simple as

curl -XPOST 'http://cluster1node:port1
/_export/copy?cluster=cluster2namehost=cluster2nodeport=port2'

Limitations will be that you have knapsack plugin installed at
cluster1node, the same JVM version in cluster1 and cluster2, same ES
version in cluster1 and cluster2, and all your indexes have stored fields,
preferably the _source field. Also, cluster1 must not modify the indexes
while the _export/copy is running, or cluster2 may have different data
(there is no inherent locking).

In the new knapsack export version, you will be able to use arbitrary ES
queries to select subsets of the cluster data to copy, so only the hits of
a query can be transferred.

Jörg

Re: Unique Count in aggregations

2014-01-08 Thread Vaidik Kapoor

Haven't tried the aggregations module. But if what you want are unique
terms, I think you can do that using Term Facets as well. Also, in case you
use Terms Facet, you will have to select such a size that will ensure that
ES returns all the terms and does not discard some when size is lesser than
the number of unique terms.

Vaidik Kapoor
vaidikkapoor.info

On 8 January 2014 14:41, Konstantinos Zacharakis kzach...@gmail.com wrote:

Hello,

I would like to ask about the support of unique terms in aggregations.
Shay had mentioned in the issue
#1044https://github.com/elasticsearch/elasticsearch/issues/1044 that
once the aggregation framework is done you plan to add this new feature?
Since aggregations are here since Beta2, how close in your roadmap is the
unique terms support? Should we expect that in 1.0.0 Release?

Kind Regards
Kostas

Re: Unique Count in aggregations

2014-01-08 Thread Konstantinos Zacharakis

Hi Vaidik,

This method is fine when the term cardinality is low and can be also
achieved using the aggregations framework.
However when cardinality is high, the memory footprint will be also high
and for sure not so safe.

On Wednesday, 8 January 2014 11:15:22 UTC+2, Vaidik Kapoor wrote:

Vaidik Kapoor
vaidikkapoor.info

On 8 January 2014 14:41, Konstantinos Zacharakis
kzac...@gmail.comjavascript:
wrote:

Hello,

Kind Regards
Kostas

cassandra river plugin installation issue

I have downloaded river from: https://github.com/eBay/cassandra-river

change the settings in file: CassandraRiver.java as per my Cassandra 
setting:

if (riverSettings.settings().containsKey(cassandra)) {
@SuppressWarnings(unchecked)
MapString, Object couchSettings = (MapString, Object) 
settings.settings().get(cassandra);
this.clusterName = 
XContentMapValues.nodeStringValue(couchSettings.get(cluster_name), Test 
Cluster);
this.keyspace = 
XContentMapValues.nodeStringValue(couchSettings.get(keyspace), 
topic_space);
this.columnFamily = 
XContentMapValues.nodeStringValue(couchSettings.get(column_family), 
users);
this.batchSize = 
XContentMapValues.nodeIntegerValue(couchSettings.get(batch_size), 1000);
this.hosts = 
XContentMapValues.nodeStringValue(couchSettings.get(hosts), 
localhost:9160);
this.username = 
XContentMapValues.nodeStringValue(couchSettings.get(username), 
USERNAME);
this.password = 
XContentMapValues.nodeStringValue(couchSettings.get(password), P$$WD);
} else {
/*
 * Set default values
 */
this.clusterName = Test Cluster;
this.keyspace = topic_space;
this.columnFamily = users;
this.batchSize = 1000;
this.hosts = localhost:9160;
this.username = USERNAME;
this.password = P$$WD;
}

when i build maven using given command, mvn clean package in TEST mvn log 
it shows:

---
 T E S T S
---
Running org.elasticsearch.river.cassandra.CassandraRiverIntegrationTest
Configuring TestNG with: 
org.apache.maven.surefire.testng.conf.TestNG652Configurator@67eaf25d
Exception in thread Queue-Indexer-thread-0 java.lang.NullPointerException
at 
org.elasticsearch.river.cassandra.CassandraRiver$Indexer.run(CassandraRiver.java:149)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Exception in thread Queue-Indexer-thread-2 java.lang.NullPointerException
at 
org.elasticsearch.river.cassandra.CassandraRiver$Indexer.run(CassandraRiver.java:149)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Exception in thread Queue-Indexer-thread-5 java.lang.NullPointerException
at 
org.elasticsearch.river.cassandra.CassandraRiver$Indexer.run(CassandraRiver.java:149)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Exception in thread Queue-Indexer-thread-4 java.lang.NullPointerException

i tried to do same after installing plugin in ES, it shows same error 
continuously.
Anybody have any idea, whats going wrong with my setup??


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ef16f8fa-3145-43be-87ce-e8f53060938f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: cassandra river plugin installation issue

CassandraRiver.java:149 contains:
logger.info(Starting thread with {} keys, this.keys.rowColumnMap.size());
where rowColumnMap is a map, and may be empty thats why this error comes

And at first i build that river module normally and install it as a plugin 
in ES.
But when i ran script:
curl -XPUT 'localhost:9200/_river/userinfo/_meta' -d '{
type : cassandra,
cassandra : {
cluster_name : Test Cluster,
keyspace : topic_space,
column_family : users,
batch_size : 100,
hosts : localhost:9160
},
index : {
index : userinfo,
type : users
}
}'


same error comes in ES console, same as i have copied from maven console. 
and it also not fetching data from cassandra to ES. 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0bc5a111-0562-46ff-a640-9b9241638887%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: How many metadata fields exist of MP3 file ?

I would recommend not to use the mapper attachment but to manage that on your
side.
I removed for example mapper attachment from fsriver project to have a finer
control. (see https://github.com/dadoonet/fsriver/issues/38)

BTW, I'm not aware on how you can get ALBUM field using Tika. Any pointer?
Could be nice to add it to fsriver as well.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 8 janvier 2014 at 10:49:47, HongXuan Ji (hxua...@gmail.com) a écrit:

Thanks for the reply.

Except for the six standard fields, I also want to know the extra field. For
example, in Solr we can extract the album field in MP3 file.
Does this function also support in ElasticSearch? I just tested: I post a mp3
file into ES, but the fields of the mp3 file contains only the six fields.

Ideas?

Thanks a lot.

David Pilato於 2014年1月8日星期三UTC+8下午4時34分07秒寫道：
Have a look at
https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L376

You will see that mapper attachment reads:

Metadata.DATE

Metadata.TITLE

Metadata.AUTHOR

Metadata.KEYWORDS

Metadata.CONTENT_TYPE

Metadata.CONTENT_LENGTH

Does it help?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 8 janvier 2014 at 05:05:10, HongXuan Ji (hxu...@gmail.com) a écrit:

Hi all,

I am wondering how many metadata fields of MP3 files exist when I post the mp3
file into ElasticSearch using the mapper-attachment.

Because in Solr we can know the field information through the endpoint
SOLR_HOST/update/extract?extractOnly=true,

but in ElasticSearch are there any ways to get such informations? Except for
the MP3 files, how about the doc files?

I know the ElasticSearch use tika to support this operations, can you give me
some example to fetch some special field of some special file format?

Regards,

Ivan

Re: cassandra river plugin installation issue

1 change which i have made in that cassandra-river project is to change the 
casandra jar version from 1.3 to 2.0.3 in pox.xml as i am using Cassandra 
2.0.4
Any idea whats going wrong?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3f8867fd-5f92-47a5-bf6b-5f4b2f5306ee%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: cassandra river plugin installation issue

So probably

CassandraCFData cassandraData = db.getCFData(columnFamily, start, 1000);

did not get any data from Cassandra?


Never played with this plugin either Cassandra so I'm afraid I can't help more 
here!


-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 8 janvier 2014 at 11:21:26, shamsul haque (shams...@gmail.com) a écrit:

CassandraRiver.java:149 contains:
logger.info(Starting thread with {} keys, this.keys.rowColumnMap.size());
where rowColumnMap is a map, and may be empty thats why this error comes

And at first i build that river module normally and install it as a plugin in 
ES.
But when i ran script:
curl -XPUT 'localhost:9200/_river/userinfo/_meta' -d '{
    type : cassandra,
    cassandra : {
cluster_name : Test Cluster,
        keyspace : topic_space,
        column_family : users,
        batch_size : 100,
        hosts : localhost:9160
    },
    index : {
        index : userinfo,
        type : users
    }
}'


same error comes in ES console, same as i have copied from maven console. 
and it also not fetching data from cassandra to ES. 
--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0bc5a111-0562-46ff-a640-9b9241638887%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.52cd2806.109cf92e.1449b%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.

Re: cassandra river plugin installation issue

ok, Thanks for pointing this.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/69213f6c-0e23-4a8d-bbf8-f9423d7200b3%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks

Dear all:
   I insert 1 logs to elasticsearch, each log is about 2M, and 
there are about 3000 keys and values.
 when i insert about 2, it used about 30G memory, and then 
elasticsearch is very slow, and it's hard to insert log.
 Could someone help me how to solve it? Thanks very much.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f57de01f-7c63-4d88-9bcc-80daf7cc6a1d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Order results by value in one of the array entries.

2014-01-08 Thread Johan E

Hi Jun,

Thanks for your reply.

Im not sure how I can get that to work. In my project I need to only 
boost/order by the stock of warehouse_a, how do I use only the value for 
that entry in the array?

Thanks
Johan

On Wednesday, January 8, 2014 4:35:50 AM UTC, Jun Ohtani wrote:

 Hi Johan, 

 You try to use script based sorting. 

 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-sort.html#_script_based_sorting
  

 Or the function score query. 

 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#_script_score
  

 I hope this helps. 

 Regards, 

  
 Jun Ohtani 
 joh...@gmail.com javascript: 
 blog : http://blog.johtani.info 
 twitter : http://twitter.com/johtani 




 2014/01/07 19:45、Johan E joha...@gmail.com javascript: のメール： 

  Hi, 
  
  I'm trying to order the result of a query by a specified entry in a 
 array. 
  
  Here is a sample entry 
  
  
  { 
  product_name: product alfa, 
  product_id: 4a86c92ccd26111d7ba0eada7da6a75af, 
  description: This is a sample product, 
  image_id: product_a.jpg, 
  inventory: [ 
  { 
  warehouse: warehouse_a, 
  stock: 99 
  }, 
  { 
  warehouse: warehouse_b, 
  stock: 19 
  }, 
  { 
  warehouse: warehouse_c, 
  stock: 99 
  } 
  ] 
  } 
  
  If there were more products containing alfa, I would (for example) 
 want to sort they by the stock of a warehouse. 
  
  I'm currently using a query like: 
  
  POST _search 
  { 
  query: { 
  match: { 
  product_name:{ 
  query:alfa, 
  type : phrase 
  } 
  } 
  }, 
  filter: { 
  bool: { 
  must: [ 
 { 
 term: { 
availability.warehouse: warehouse_a 
 } 
 } 
  ] 
  } 
  } 
  } 
  
  I would like the results sorted by stock (for warehouse_a only) 
 descending. 
  
  Any ideas? 
  
  
  -- 
  You received this message because you are subscribed to the Google 
 Groups elasticsearch group. 
  To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com javascript:. 
  To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/01a7baad-40e3-40b3-8104-66910762b004%40googlegroups.com.
  

  For more options, visit https://groups.google.com/groups/opt_out. 



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0eb410ef-0117-4004-84f0-713b5b02616f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Is it possible to do a text query against a (pre)defined set of fields?

2014-01-08 Thread Ville Mattila

Hi,

I am indexing some large documents in an index. When making full text 
queries, I've generally used {text: {_all: some text search}} to find 
all possible results. However, the document contains a few private fields 
that should be queryable only by a certain user group.

What I was wondering is if there is a possibility to define some kind of 
alias for a set of fields (or even better - all fields except the set of 
fields) in the mapping definition. I could then do a query {text: 
{alias_for_public_fields: some text search}} while the private fields 
would not be searched for. I do not know if this is possible already now?

I know that it's possible to list all fields in the query and leave out the 
privates, but as there can be hundreds of fields that should be queryable 
but only 2-3 private fields, listing fields explicitly adds significant 
overhead to the queries.

Best regards,
Ville

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8dc6c390-b483-40d0-bc4e-264380743aef%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: How many metadata fields exist of MP3 file ?

2014-01-08 Thread HongXuan Ji

Hi David,

I only got the ALBUM field by using the endpoint of Solr, which is
HOST/solr/update/extract?extractOnly=true.
So it seems the mapper attachment does not support the extra field
extraction. right?

BTW, can you give me some tutorial about the fsriver? I am also curious
what's the plugin for ? What's the purpose of the plugin?

Best,

Ivan

David Pilato於 2014年1月8日星期三UTC+8下午6時23分03秒寫道：

I would recommend not to use the mapper attachment but to manage that on
your side.
I removed for example mapper attachment from fsriver project to have a
finer control. (see https://github.com/dadoonet/fsriver/issues/38)

BTW, I'm not aware on how you can get ALBUM field using Tika. Any pointer?
Could be nice to add it to fsriver as well.

--
*David Pilato* | *Technical Advocate* | *Elasticsearch.com*
@dadoonet https://twitter.com/dadoonet |
@elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 8 janvier 2014 at 10:49:47, HongXuan Ji (hxu...@gmail.com javascript:)
a écrit:

Thanks for the reply.

Except for the six standard fields, I also want to know the extra field.
For example, in Solr we can extract the album field in MP3 file.
Does this function also support in ElasticSearch? I just tested: I post a
mp3 file into ES, but the fields of the mp3 file contains only the six
fields.

Ideas?

Thanks a lot.

David Pilato於 2014年1月8日星期三UTC+8下午4時34分07秒寫道：

Have a look at
https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L376

You will see that mapper attachment reads:

Metadata.DATE
Metadata.TITLE
Metadata.AUTHOR
Metadata.KEYWORDS
Metadata.CONTENT_TYPE
Metadata.CONTENT_LENGTH

Does it help?

--
*David Pilato* | *Technical Advocate* | *Elasticsearch.com*
@dadoonet https://twitter.com/dadoonet |
@elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 8 janvier 2014 at 05:05:10, HongXuan Ji (hxu...@gmail.com) a écrit:

Hi all,

I am wondering how many metadata fields of MP3 files exist when I post
the mp3 file into ElasticSearch using the mapper-attachment.

Because in Solr we can know the field information through the endpoint
SOLR_HOST/update/extract?extractOnly=true,

but in ElasticSearch are there any ways to get such informations? Except
for the MP3 files, how about the doc files?

I know the ElasticSearch use tika to support this operations, can you
give me some example to fetch some special field of some special file
format?

Regards,

Ivan

Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks



On Wednesday, January 8, 2014 6:58:02 PM UTC+8, xjj2...@gmail.com wrote:

 Dear all:
I insert 1 logs to elasticsearch, each log is about 2M, and 
 there are about 3000 keys and values.
  when i insert about 2, it used about 30G memory, and then 
 elasticsearch is very slow, and it's hard to insert log.
  Could someone help me how to solve it? Thanks very much.



The following is my log format:
{
  
user1:{{costprice:122},{sellprice:124},{stock:12},{sell:122},{},{}],


}
 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6d215d9b-a194-4a97-9ab5-081d7e8eb3ab%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks



On Wednesday, January 8, 2014 6:58:02 PM UTC+8, xjj2...@gmail.com wrote:

 Dear all:
I insert 1 logs to elasticsearch, each log is about 2M, and 
 there are about 3000 keys and values.
  when i insert about 2, it used about 30G memory, and then 
 elasticsearch is very slow, and it's hard to insert log.
  Could someone help me how to solve it? Thanks very much.


The following is my log format:
{
  
user1:{{costprice:122},{sellprice:124},{stock:12},{sell:122},{},{}],
.
  product:[{}],
  name:[]
} 

There are about 4000~ 1 users information, so a log may be 2M
 Thanks

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ac8cc57b-61ca-497e-9a27-4db8870f3916%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks

Do you insert that using bulk?

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 8 janvier 2014 at 12:29:33, xjj210...@gmail.com (xjj210...@gmail.com) a 
écrit:



On Wednesday, January 8, 2014 6:58:02 PM UTC+8, xjj2...@gmail.com wrote:
Dear all:
       I insert 1 logs to elasticsearch, each log is about 2M, and there 
are about 3000 keys and values.
 when i insert about 2, it used about 30G memory, and then elasticsearch is 
very slow, and it's hard to insert log.
 Could someone help me how to solve it? Thanks very much.

The following is my log format:
{
  
user1:{{costprice:122},{sellprice:124},{stock:12},{sell:122},{},{}],
.
  product:[{}],
  name:[]
} 

There are about 4000~ 1 users information, so a log may be 2M
 Thanks
--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ac8cc57b-61ca-497e-9a27-4db8870f3916%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.52cd36cd.333ab105.1449b%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.

Re: How many metadata fields exist of MP3 file ?

Mapper attachment does not support extra field extraction. May be you could
open an issue there?
https://github.com/elasticsearch/elasticsearch-mapper-attachments

About FSRiver, I guess everything is described here:
https://github.com/dadoonet/fsriver#filesystem-river-for-elasticsearch
Is there something you don't understand?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 8 janvier 2014 at 12:24:11, HongXuan Ji (hxua...@gmail.com) a écrit:

Hi David,

I only got the ALBUM field by using the endpoint of Solr, which is
HOST/solr/update/extract?extractOnly=true.
So it seems the mapper attachment does not support the extra field extraction.
right?

BTW, can you give me some tutorial about the fsriver? I am also curious what's
the plugin for ? What's the purpose of the plugin?

Best,

Ivan

David Pilato於 2014年1月8日星期三UTC+8下午6時23分03秒寫道：
I would recommend not to use the mapper attachment but to manage that on your
side.
I removed for example mapper attachment from fsriver project to have a finer
control. (see https://github.com/dadoonet/fsriver/issues/38)

BTW, I'm not aware on how you can get ALBUM field using Tika. Any pointer?
Could be nice to add it to fsriver as well.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 8 janvier 2014 at 10:49:47, HongXuan Ji (hxu...@gmail.com) a écrit:

Thanks for the reply.

Ideas?

Thanks a lot.

You will see that mapper attachment reads:

Metadata.DATE

Metadata.TITLE

Metadata.AUTHOR

Metadata.KEYWORDS

Metadata.CONTENT_TYPE

Metadata.CONTENT_LENGTH

Does it help?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 8 janvier 2014 at 05:05:10, HongXuan Ji (hxu...@gmail.com) a écrit:

Hi all,

I am wondering how many metadata fields of MP3 files exist when I post the mp3
file into ElasticSearch using the mapper-attachment.

Because in Solr we can know the field information through the endpoint
SOLR_HOST/update/extract?extractOnly=true,

but in ElasticSearch are there any ways to get such informations? Except for
the MP3 files, how about the doc files?

I know the ElasticSearch use tika to support this operations, can you give me
some example to fetch some special field of some special file format?

Regards,

Ivan

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/742f86b9-9dd8-4354-ae50-26332f0c4dc0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4495d489-6a3f-4b57-95a2-eefccbe48cf7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/00fe2081-0f22-400f-a0be-78ee5687ee10%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks

On Wednesday, January 8, 2014 7:30:21 PM UTC+8, David Pilato wrote:

Do you insert that using bulk?

--
*David Pilato* | *Technical Advocate* | *Elasticsearch.com*
@dadoonet https://twitter.com/dadoonet |
@elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 8 janvier 2014 at 12:29:33, xjj2...@gmail.com javascript: (
xjj2...@gmail.com javascript:) a écrit:

On Wednesday, January 8, 2014 6:58:02 PM UTC+8, xjj2...@gmail.com wrote:

Dear all:
I insert 1 logs to elasticsearch, each log is about 2M, and
there are about 3000 keys and values.
when i insert about 2, it used about 30G memory, and then
elasticsearch is very slow, and it's hard to insert log.
Could someone help me how to solve it? Thanks very much.

The following is my log format:
{

user1:{{costprice:122},{sellprice:124},{stock:12},{sell:122},{},{}],
.
product:[{}],
name:[]
}

No,i insert a log one by one, use thrift to transport the log . I set
heap_size=30G, when i insert 2, it used 30g memory. I don't change the
elasticsearch.yml
except the heap_size ,and thrift.frame.(most of value i use the default
value) Thanks,

Re: How many metadata fields exist of MP3 file ?

2014-01-08 Thread HongXuan Ji

OK, I will post the issue later.

About the river,

The first line: This river plugin helps to index documents from your local
file system and using SSH.

Does it means I store a bunch of pdf file in my local directory and by
using the river plugin I can search the file in the directory. ?

In fact, I started to study ElasticSearch this week and I am not very
familiar the filesystem means here.
Thanks a lot.

Ivan
David Pilato於 2014年1月8日星期三UTC+8下午7時32分17秒寫道：

Mapper attachment does not support extra field extraction. May be you
could open an issue there?
https://github.com/elasticsearch/elasticsearch-mapper-attachments

About FSRiver, I guess everything is described here:
https://github.com/dadoonet/fsriver#filesystem-river-for-elasticsearch
Is there something you don't understand?

--
*David Pilato* | *Technical Advocate* | *Elasticsearch.com*
@dadoonet https://twitter.com/dadoonet |
@elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 8 janvier 2014 at 12:24:11, HongXuan Ji (hxu...@gmail.com javascript:)
a écrit:

Hi David,

I only got the ALBUM field by using the endpoint of Solr, which is
HOST/solr/update/extract?extractOnly=true.
So it seems the mapper attachment does not support the extra field
extraction. right?

BTW, can you give me some tutorial about the fsriver? I am also curious
what's the plugin for ? What's the purpose of the plugin?

Best,

Ivan

David Pilato於 2014年1月8日星期三UTC+8下午6時23分03秒寫道：

I would recommend not to use the mapper attachment but to manage that
on your side.
I removed for example mapper attachment from fsriver project to have a
finer control. (see https://github.com/dadoonet/fsriver/issues/38)

BTW, I'm not aware on how you can get ALBUM field using Tika. Any
pointer? Could be nice to add it to fsriver as well.

--
*David Pilato* | *Technical Advocate* | *Elasticsearch.com*
@dadoonet https://twitter.com/dadoonet |
@elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 8 janvier 2014 at 10:49:47, HongXuan Ji (hxu...@gmail.com) a écrit:

Thanks for the reply.

Except for the six standard fields, I also want to know the extra field.
For example, in Solr we can extract the album field in MP3 file.
Does this function also support in ElasticSearch? I just tested: I post a
mp3 file into ES, but the fields of the mp3 file contains only the six
fields.

Ideas?

Thanks a lot.

David Pilato於 2014年1月8日星期三UTC+8下午4時34分07秒寫道：

Have a look at
https://github.com/elasticsearch/elasticsearch-mapper-attachments/blob/master/src/main/java/org/elasticsearch/index/mapper/attachment/AttachmentMapper.java#L376

You will see that mapper attachment reads:

Metadata.DATE
Metadata.TITLE
Metadata.AUTHOR
Metadata.KEYWORDS
Metadata.CONTENT_TYPE
Metadata.CONTENT_LENGTH

Does it help?

--
*David Pilato* | *Technical Advocate* | *Elasticsearch.com*
@dadoonet https://twitter.com/dadoonet |
@elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 8 janvier 2014 at 05:05:10, HongXuan Ji (hxu...@gmail.com) a écrit:

Hi all,

I am wondering how many metadata fields of MP3 files exist when I post
the mp3 file into ElasticSearch using the mapper-attachment.

Because in Solr we can know the field information through the endpoint
SOLR_HOST/update/extract?extractOnly=true,

but in ElasticSearch are there any ways to get such informations?
Except for the MP3 files, how about the doc files?

I know the ElasticSearch use tika to support this operations, can you
give me some example to fetch some special field of some special file
format?

Regards,

Ivan

Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks

That was not really my question. Are you using BULK feature?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 8 janvier 2014 at 12:38:00, xjj210...@gmail.com (xjj210...@gmail.com) a
écrit:

I use the elasticsearch version is 0.90.2

On Wednesday, January 8, 2014 7:30:21 PM UTC+8, David Pilato wrote:
Do you insert that using bulk?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 8 janvier 2014 at 12:29:33, xjj2...@gmail.com (xjj2...@gmail.com) a écrit:

The following is my log format:
{
user1:{{costprice:122},{sellprice:124},{stock:12},{sell:122},{},{}],
.
product:[{}],
name:[]
}

There are about 4000~ 1 users information, so a log may be 2M
Thanks
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ac8cc57b-61ca-497e-9a27-4db8870f3916%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/66b722eb-0d8b-4bc9-88a2-f13ccd08a92b%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: How many metadata fields exist of MP3 file ?

Yes. It index your documents available on your local hard drive.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 8 janvier 2014 at 12:42:56, HongXuan Ji (hxua...@gmail.com) a écrit:

OK, I will post the issue later.

About the river,

The first line: This river plugin helps to index documents from your local
file system and using SSH.

Does it means I store a bunch of pdf file in my local directory and by
using the river plugin I can search the file in the directory. ?

In fact, I started to study ElasticSearch this week and I am not very familiar
the filesystem means here.
Thanks a lot.

Ivan
David Pilato於 2014年1月8日星期三UTC+8下午7時32分17秒寫道：
Mapper attachment does not support extra field extraction. May be you could
open an issue there?
https://github.com/elasticsearch/elasticsearch-mapper-attachments

About FSRiver, I guess everything is described here:
https://github.com/dadoonet/fsriver#filesystem-river-for-elasticsearch
Is there something you don't understand?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 8 janvier 2014 at 12:24:11, HongXuan Ji (hxu...@gmail.com) a écrit:

Hi David,

I only got the ALBUM field by using the endpoint of Solr, which is
HOST/solr/update/extract?extractOnly=true.
So it seems the mapper attachment does not support the extra field extraction.
right?

BTW, can you give me some tutorial about the fsriver? I am also curious what's
the plugin for ? What's the purpose of the plugin?

Best,

Ivan

BTW, I'm not aware on how you can get ALBUM field using Tika. Any pointer?
Could be nice to add it to fsriver as well.

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 8 janvier 2014 at 10:49:47, HongXuan Ji (hxu...@gmail.com) a écrit:

Thanks for the reply.

Ideas?

Thanks a lot.

You will see that mapper attachment reads:

Metadata.DATE

Metadata.TITLE

Metadata.AUTHOR

Metadata.KEYWORDS

Metadata.CONTENT_TYPE

Metadata.CONTENT_LENGTH

Does it help?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 8 janvier 2014 at 05:05:10, HongXuan Ji (hxu...@gmail.com) a écrit:

Hi all,

I am wondering how many metadata fields of MP3 files exist when I post the mp3
file into ElasticSearch using the mapper-attachment.

Because in Solr we can know the field information through the endpoint
SOLR_HOST/update/extract?extractOnly=true,

but in ElasticSearch are there any ways to get such informations? Except for
the MP3 files, how about the doc files?

I know the ElasticSearch use tika to support this operations, can you give me
some example to fetch some special field of some special file format?

Regards,

Ivan

--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/742f86b9-9dd8-4354-ae50-26332f0c4dc0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/4495d489-6a3f-4b57-95a2-eefccbe48cf7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/00fe2081-0f22-400f-a0be-78ee5687ee10%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails

Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks

no, i don't use bulk. You mean i use bulk it maybe solve the problem?Thanks

On Wednesday, January 8, 2014 7:43:41 PM UTC+8, David Pilato wrote:

That was not really my question. Are you using BULK feature?

--
*David Pilato* | *Technical Advocate* | *Elasticsearch.com*
@dadoonet https://twitter.com/dadoonet |
@elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 8 janvier 2014 at 12:38:00, xjj2...@gmail.com javascript: (
xjj2...@gmail.com javascript:) a écrit:

I use the elasticsearch version is 0.90.2

On Wednesday, January 8, 2014 7:30:21 PM UTC+8, David Pilato wrote:

Do you insert that using bulk?

--
*David Pilato* | *Technical Advocate* | *Elasticsearch.com*
@dadoonet https://twitter.com/dadoonet |
@elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 8 janvier 2014 at 12:29:33, xjj2...@gmail.com (xjj2...@gmail.com) a
écrit:

On Wednesday, January 8, 2014 6:58:02 PM UTC+8, xjj2...@gmail.com wrote:

The following is my log format:
{

user1:{{costprice:122},{sellprice:124},{stock:12},{sell:122},{},{}],
.
product:[{}],
name:[]
}

memory surges in client app when a node dies

2014-01-08 Thread nicolas . long

Hi all,

I have a situation where if a node in our cluster dies (for whatever 
reason) the client app experiences a surge in memory usage, full GCs, and 
essentially dies.

I think this is because the client holds on to the connections for a whlie 
before realising the node is dead.

Does this sound possible? And does anyone have tips for how to deal with 
this. My thinking so far is:

1. More memory

2. A circuit-breaker pattern or some such to make sure the app disconnects 
quicker when ES is not responding

But are there ways to configure the ES client to improve the behaviour here?

Thanks,

Nic

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/66c393a3-91d9-4314-a38f-e5267390b9b7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks

I use the elasticsearch version is 0.90.2

On Wednesday, January 8, 2014 7:30:21 PM UTC+8, David Pilato wrote:

Do you insert that using bulk?

--
*David Pilato* | *Technical Advocate* | *Elasticsearch.com*
@dadoonet https://twitter.com/dadoonet |
@elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 8 janvier 2014 at 12:29:33, xjj2...@gmail.com javascript: (
xjj2...@gmail.com javascript:) a écrit:

On Wednesday, January 8, 2014 6:58:02 PM UTC+8, xjj2...@gmail.com wrote:

The following is my log format:
{

user1:{{costprice:122},{sellprice:124},{stock:12},{sell:122},{},{}],
.
product:[{}],
name:[]
}

Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks

I only insert the log to elasticsearch.   I will do the following wrok:
 1:  write the data to elasticsearch.
 2: Then to search the data.

Now, when i insert the data to es, It used too much memory. I wonder why 
the es use so much memory.
Could you give me some suggestions. Thanks

 I use jmap to watch the pid. the  result is following:(i change the 
heap_size 1G to watch the memory use)

num   #instances#bytes  Class description
--
1:  229353  18348240java.util.WeakHashMap$Entry[]
2:  229353  12843768java.util.WeakHashMap
3:  145045  8703384 org.elasticsearch.index.mapper.FieldMapper[]
4:  229353  7339296 java.lang.ref.ReferenceQueue
5:  235890  5661360 
org.elasticsearch.common.collect.RegularImmutableMap$TerminalEntry
6:  229346  5504304 org.apache.lucene.util.CloseableThreadLocal
7:  57303   4125816 
org.elasticsearch.index.mapper.core.LongFieldMapper
8:  85939   3836608 char[]
9:  155465  3731160 
org.elasticsearch.common.collect.RegularImmutableMap$NonTerminalEntry
10: 229353  3669648 java.lang.ThreadLocal
11: 229353  3669648 java.lang.ref.ReferenceQueue$Lock
12: 229353  3669648 java.util.concurrent.atomic.AtomicInteger
13: 114662  3669184 
org.elasticsearch.index.analysis.NamedAnalyzer
14: 28698   3518912 
org.elasticsearch.common.collect.RegularImmutableMap$LinkedEntry[]
15: 145044  3481056 java.util.Arrays$ArrayList
16: 145044  3481056 org.elasticsearch.index.mapper.FieldMappers
17: 114620  2750880 
org.elasticsearch.index.analysis.NumericLongAnalyzer
18: 52044   2081760 org.apache.lucene.document.FieldType
19: 85939   2062536 java.lang.String
20: 57499   1839968 
org.elasticsearch.index.mapper.FieldMapper$Names
21: 114683  1834928 
org.apache.lucene.analysis.Analyzer$PerFieldReuseStrategy
22: 114662  1834592 
org.apache.lucene.analysis.Analyzer$GlobalReuseStrategy
23: 57493   1379832 
org.elasticsearch.index.fielddata.FieldDataType
24: 57332   1375968 
org.elasticsearch.index.mapper.core.NumberFieldMapper$1
25: 57303   1375272 org.elasticsearch.common.Explicit
26: 14321   1267344 byte[]
27: 37088   1186816 java.util.HashMap$Entry
28: 14300   915200 
 org.elasticsearch.index.mapper.object.ObjectMapper
29: 2180660520  java.lang.Object[]
30: 14349   573960 
 org.elasticsearch.common.collect.RegularImmutableMap
31: 16458   526656 
 org.elasticsearch.common.collect.RegularImmutableList
32: 14314   343536  org.apache.lucene.index.Term
33: 14314   343536  org.apache.lucene.util.BytesRef
34: 14293   343032 
 org.elasticsearch.common.collect.RegularImmutableMap$EntrySet
35: 14293   343032 
 org.elasticsearch.common.collect.RegularImmutableAsList
36: 14293   343032 
 org.elasticsearch.common.collect.ImmutableMapValues
37: 8   279936  java.util.HashMap$Entry[]
38: 14314   229024  java.lang.Object
39: 14314   229024 
 org.elasticsearch.common.lucene.search.TermFilter
40: 216451936   org.elasticsearch.index.mapper.ObjectMappers
41: 1   16400   java.lang.String[]
42: 119 8568   
 org.elasticsearch.index.mapper.core.StringFieldMapper
43: 1   8208   
 org.elasticsearch.common.jackson.core.sym.CharsToNameCanonicalizer$Bucket[]
44: 28  1120   
 org.elasticsearch.common.collect.SingletonImmutableBiMap
45: 14  728 org.elasticsearch.index.mapper.RootMapper[]
46: 7   728 
org.elasticsearch.index.mapper.DocumentMapper
47: 7   672 
org.elasticsearch.index.mapper.internal.TimestampFieldMapper
48: 28  672 
org.elasticsearch.common.collect.SingletonImmutableSet
49: 7   616 
org.elasticsearch.index.mapper.internal.TTLFieldMapper
50: 7   560 
org.elasticsearch.index.mapper.internal.SourceFieldMapper
51: 7   560 
org.elasticsearch.index.mapper.internal.SizeFieldMapper
52: 7   504 
org.elasticsearch.index.mapper.object.RootObjectMapper
53: 7   504 
org.elasticsearch.index.mapper.internal.BoostFieldMapper
54: 21  504 
org.elasticsearch.index.analysis.FieldNameAnalyzer
55: 14  448 
java.util.concurrent.locks.ReentrantLock$NonfairSync
56: 7   392 
org.elasticsearch.index.mapper.internal.UidFieldMapper
57: 7   392 
org.elasticsearch.index.mapper.internal.IdFieldMapper
58: 7   392

Re: Order results by value in one of the array entries.

2014-01-08 Thread Johan E

I ended up changing the format of the json, with warehouse stock in 
separate entries in an array. This way I can check for it and get the stock 
at the same time.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6c2103ee-7786-4ec0-b981-10aabb365fb9%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: memory surges in client app when a node dies

2014-01-08 Thread nicolas . long

We're using the Java transport client.

The problem only happens when the app is dealing with a high number of
requests. I wondered whether it was because the client takes a little bit
of time to detect that the node is unavailable: potentially up to 10
seconds in total (with default settings - 5 seconds to ping the node,
another 5 for the timeout).

And perhaps even after the node has been dropped the existing connections
to the node still need to timeout (not sure what the default is here)?

On Wednesday, 8 January 2014 13:19:29 UTC, Jason Wee wrote:

It should not be possible right? If you configures client app to have two
or more elasticsearch nodes, it should detect if elasticsearch node is down
and not use it during indexing/querying.

What client are you using?

Jason

On Wed, Jan 8, 2014 at 7:48 PM, nicola...@guardian.co.uk javascript:wrote:

Hi all,

I have a situation where if a node in our cluster dies (for whatever
reason) the client app experiences a surge in memory usage, full GCs, and
essentially dies.

I think this is because the client holds on to the connections for a
whlie before realising the node is dead.

Does this sound possible? And does anyone have tips for how to deal with
this. My thinking so far is:

1. More memory

2. A circuit-breaker pattern or some such to make sure the app
disconnects quicker when ES is not responding

But are there ways to configure the ES client to improve the behaviour
here?

Thanks,

Nic

Re: memory surges in client app when a node dies

2014-01-08 Thread nicolas . long

I think you probably replied just after mine!

We are using the transport client yes. And to clarify, ES itself is fine 
during these periods. It is the client app that has problems.

On Wednesday, 8 January 2014 13:34:29 UTC, Jörg Prante wrote:

 Have you tried TransportClient? TransportClient does not share the heap 
 memory with a cluster node. The setting client.transport.ping_timeout 
 checks if the nodes connected still respond. By default, it is 5 seconds, I 
 use values up to 30 seconds to survive long GCs without disconnects.

 Jörg



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1ee0c0fe-967d-4b2a-bdce-62173b255911%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Beta2 Java Client: java.nio.channels.UnresolvedAddressException

There are FQDNs like vcll36a-1001.equity.csfb.com which can not be resolved
by your DNS settings, it seems.

14 eth interfaces are quite cool to try to connect to, but I would reduce
them by the network interface eth alias names ES provides
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-network.html

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHj-Vq17bisdYFSmMDHzrih6EkiWQCOoD5JehWpim-N3w%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: memory surges in client app when a node dies

ES TransportClient uses a RetryListener which is a bit flaky in case of
exceptions caused by faulty nodes. Some users reported an explosion of port
use and connection retries, and this may also bring the client memory to a
limit. Maybe you have stack traces that show abnormal behavior so it's
worth to raise a github issue?

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHbhiwkCmkKdu_4x7f6S28pmC35detFcXyaDDg9Dkjrkg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

No hit using scan/scroll with has_parent filter

2014-01-08 Thread Jean-Baptiste Lièvremont

Hi folks,

I use a parent/child mapping configuration which works flawlessly with 
classic search requests, e.g using has_parent to find child documents 
with criteria on the parent documents.

I am trying to get all child document IDs that match a given set of 
criteria using scan and scroll, which also works well - until I introduce 
the has_parent filter, in which case the scroll request returns no hit 
(although total_hits is correct).

Is it a known issue?

I can provide sample mapping files and queries with associated/expected 
results. Please note that this behavior has been noticed on 0.90.6 but is 
still present in 0.90.9.

Thanks, best regards,
-- Jean-Baptiste Lièvremont

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fd7c563e-34f7-4aa8-ab1a-460840ba2af0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: More like this scoring algorithm unclear

2014-01-08 Thread Justin Treher

Hey Maarten,

I would use the explain:true option to see just why your documents are
being scored higher than others. MoreLikeThis using the same fulltext
scoring as far as I know, so term position would affect score.

http://lucene.apache.org/core/3_0_3/api/contrib-queries/org/apache/lucene/search/similar/MoreLikeThis.html

Justin

On Wednesday, January 8, 2014 3:04:47 AM UTC-5, Maarten Roosendaal wrote:

Hi,

I have a question about why the 'more like this' algorithm scores
documents higher than others, while they are (at first glance) the same.

Hope someone can give some pointers.

Thanks,
Maarten

Searching indexed fields without analysing

2014-01-08 Thread Chris H

Hi. I've deployed elasticsearch with logstash and kibana to take in
Windows logs from my OSSEC log server, following this guide:
http://vichargrave.com/ossec-log-management-with-elasticsearch/
I've tweaked the logstash config to extract some specific fields from the
logs, such as User_Name. I'm having some issues searching on these fields
though.

These searches work as expected:

- User_Name: *
- User_Name: john.smith
- User_Name: john.*
- NOT User_Name: john.*

But I'm having problems with Computer accounts, which take the format
w-dc-01$ - they're being split on the - and the $ is ignored. So a
search for w-dc-01 returns all the servers named w-anything. Also I
can't do NOT User_Name: *$ to exclude computer accounts.

The mappings are created automatically by logstash, and GET
/logstash-2014.01.08/_mapping shows:

User_Name: {

type: multi_field,
fields: {
User_Name: {
type: string,
omit_norms: true
},
raw: {
type: string,
index: *not_analyzed*,
omit_norms: true,
index_options: docs,
include_in_all: false,
ignore_above: 256
}
}
},

My (limited) understanding is that the not_analyzed should stop the field
being split, so that my searching matches the full name, but it doesn't.
I'm trying both kibana and curl to get results.

Hope this makes sense. I really like the look of elasticsearch, but being
able to search on extracted fields like this is pretty key to me using it.

Thanks.

Re: How to query custom rest handler in elastic search using Java api

2014-01-08 Thread Ivan Brusic

The CustomRestAction code you posted contains *exactly* the Java code you
need to execute the same action as the REST action.

If you want to still want to use the REST URL, you cannot use the
elasticsearch libraries. /_mastering/nodes is not a valid search type.
The action does not even execute a query technically, but retrieves node
leve information.

Cheers,

Ivan


On Wed, Jan 8, 2014 at 12:46 AM, Shishir Kumar shishir.su...@gmail.comwrote:

 Hi,

 I am not facing any issue with the NodesInfoAction or the custom endpoint
 code. The rest endpoint is working fine, if I curl to it using:curl -XGET '
 localhost:9200/_mastering/nodes?prettyhttp://10.114.24.132:9200/_mastering/nodes?pretty
 '.

 I am trying to find out a way to do this from an embedded node. In other
 words, somthing like below:

 Node node =
 NodeBuilder.nodeBuilder().clusterName(elasticsearch).node();
 Client client = node.client();
 SearchResponse response =
 client.prepareSearch().setSearchType(/_mastering/nodes).
 setQuery(QueryBuilders.queryString()).
 execute().actionGet();

 P.S. the code snippet doesn't actually work. But I want to query the
 /_mastering/nodes through Java api.


 On Friday, 3 January 2014 13:35:13 UTC+5:30, Jörg Prante wrote:

 You have wrapped a NodesInfoAction, so all you have to do is

 NodesInfoResponse response = client.admin().cluster().prepa
 reNodesInfo().all().execute().actionGet();

 That is the Java API.

 Jörg

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/bc517dbc-8dfc-4b8e-a9b3-567d87cc734c%40googlegroups.com
 .

 For more options, visit https://groups.google.com/groups/opt_out.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD3Z-Ui8swVzkLoxASUiyuU9aWAeKpMkMkXfbN23vOGGQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Odd hot MVEL

2014-01-08 Thread Nikolas Everett

Does anyone know what might be causing MVEL to do this:
   100.3% (501.3ms out of 500ms) cpu usage by thread
'elasticsearch[elastic1002][search][T#23]'
 9/10 snapshots sharing following 47 elements
   java.lang.Throwable.fillInStackTrace(Native Method)
   java.lang.Throwable.fillInStackTrace(Throwable.java:782)
   java.lang.Throwable.init(Throwable.java:265)
   java.lang.Exception.init(Exception.java:66)
   java.lang.RuntimeException.init(RuntimeException.java:62)

java.lang.IllegalArgumentException.init(IllegalArgumentException.java:53)
   sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)

sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   java.lang.reflect.Method.invoke(Method.java:606)

org.elasticsearch.common.mvel2.optimizers.impl.refl.nodes.GetterAccessor.getValue(GetterAccessor.java:43)

org.elasticsearch.common.mvel2.optimizers.impl.refl.nodes.MapAccessorNest.getValue(MapAccessorNest.java:54)

org.elasticsearch.common.mvel2.optimizers.impl.refl.nodes.VariableAccessor.getValue(VariableAccessor.java:37)

org.elasticsearch.common.mvel2.ast.ASTNode.getReducedValueAccelerated(ASTNode.java:108)

org.elasticsearch.common.mvel2.MVELRuntime.execute(MVELRuntime.java:86)

org.elasticsearch.common.mvel2.compiler.CompiledExpression.getDirectValue(CompiledExpression.java:123)

org.elasticsearch.common.mvel2.compiler.CompiledExpression.getValue(CompiledExpression.java:119)

org.elasticsearch.common.mvel2.compiler.CompiledExpression.getValue(CompiledExpression.java:106)

org.elasticsearch.common.mvel2.ast.Substatement.getReducedValueAccelerated(Substatement.java:44)

org.elasticsearch.common.mvel2.ast.BinaryOperation.getReducedValueAccelerated(BinaryOperation.java:114)

org.elasticsearch.common.mvel2.ast.BinaryOperation.getReducedValueAccelerated(BinaryOperation.java:114)

org.elasticsearch.common.mvel2.compiler.ExecutableAccessor.getValue(ExecutableAccessor.java:42)

org.elasticsearch.common.mvel2.optimizers.impl.refl.nodes.MethodAccessor.executeAndCoerce(MethodAccessor.java:164)

org.elasticsearch.common.mvel2.optimizers.impl.refl.nodes.MethodAccessor.getValue(MethodAccessor.java:73)

org.elasticsearch.common.mvel2.ast.ASTNode.getReducedValueAccelerated(ASTNode.java:108)

org.elasticsearch.common.mvel2.MVELRuntime.execute(MVELRuntime.java:86)

org.elasticsearch.common.mvel2.compiler.CompiledExpression.getDirectValue(CompiledExpression.java:123)

org.elasticsearch.common.mvel2.compiler.CompiledExpression.getValue(CompiledExpression.java:119)

org.elasticsearch.script.mvel.MvelScriptEngineService$MvelSearchScript.run(MvelScriptEngineService.java:191)

org.elasticsearch.script.mvel.MvelScriptEngineService$MvelSearchScript.runAsDouble(MvelScriptEngineService.java:206)

org.elasticsearch.common.lucene.search.function.ScriptScoreFunction.score(ScriptScoreFunction.java:54)

It isn't an error.  Looking at MVEL's source it looks like it catches this
error and works around it by inspecting the function, casting the arguments
appropriately, and they retying.  I imagine it'd be nice and fast if I
didn't get the types wrong but it works anyway which feels a bit trappy at
scale.

I know this is caused by scoring tons of documents in a FunctionScore which
is a pretty strong argument for moving all FunctionScoring into a rescore
for protection but what in the world am I doing with MVEL to make it do
this?

My candidate MVEL looks like this:
log10( ($doc['a'].empty ? 0 : $doc['a']) + ($doc['b'].empty ? 0 :
$doc['b']) + 2 )


I'm trying to reproduce it with the debugger and Elasticsearch's tests but
I haven't had any luck yet so I'd love to hear if anyone else has seen this.

Nik

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1aX7vNR3L5ORHWROKvR6fMM6BkUNVSFVxKbpR8DwT4_g%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Upgrades causing Elastic Search downtime

2014-01-08 Thread Jenny Sivapalan

Thanks both for the replies. Our rebalance process doesn't take too long
(~5 mins per node). I had some of the plugins (head, paramedic, bigdesk)
open as I was closing down the old nodes and didn't see any split brain
issue although I agree we can lead ourselves down this route by doubling
the instances. We want our cluster to rebalance as we bring nodes in and
out so disabling is not going to work for us unless I'm misunderstanding?

On Tuesday, 7 January 2014 22:16:46 UTC, Mark Walkom wrote:

You can also use cluster.routing.allocation.disable_allocation to reduce
the need of waiting for things to rebalance.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com javascript:
web: www.campaignmonitor.com

On 8 January 2014 04:41, Ivan Brusic iv...@brusic.com javascript:wrote:

Almost elasticsearch should support clusters of nodes with different
minor versions, I have seen issues between minor versions. Version 0.90.8
did contain an upgrade of Lucene (4.6), but that does not look like it
would cause your issue. You could look at the github issues tagged
0.90.[8-9] and see if something applies in your case.

A couple of points about upgrading:

If you want to use the double-the-nodes techniques (which should not be
necessary for minor version upgrades), you could decommission a node
using the Shard API. Here is a good writeup:
http://blog.sematext.com/2012/05/29/elasticsearch-shard-placement-control/

Since you doubled the amount of nodes in the cluster,
the minimum_master_nodes setting would be temporarily incorrect and
potential split-brain clusters might occur. In fact, it might have occurred
in your case since the cluster state seems incorrect. Merely hypothesizing.

Cheers,

Ivan

On Tue, Jan 7, 2014 at 9:26 AM, Jenny Sivapalan
jennifer@gmail.comjavascript:
wrote:

Hello,

We've upgraded Elastic Search twice over the last month and have
experienced downtime (roughly 8 minutes) during the roll out. I'm not sure
if it something we are doing wrong or not.

We use EC2 instances for our Elastic Search cluster and cloud formation
to manage our stack. When we deploy a new version or change to Elastic
Search we upload the new artefact, double the number of EC2 instances and
wait for the new instances to join the cluster.

For example 6 nodes form a cluster on v 0.90.7. We upload the 0.90.9
version via our deployment process and double the number nodes for the
cluster (12). The 6 new nodes will join the cluster with the 0.90.9
version.

We then want to remove each of the 0.90.7 nodes. We do this by shutting
down the node (using the plugin head), wait for the cluster to rebalance
the shards and then terminate the EC2 instances. Then repeat with the next
node. We leave the master node until last so that it does the re-election
just once.

The issue we have found in the last two upgrades is that while the
penultimate node is shutting down the master starts throwing errors and the
cluster goes red. To fix this we've stopped the Elastic Search process on
master and have had to restart each of the other nodes (though perhaps they
would have rebalanced themselves in a longer time period?). We find
that we send an increase error response to our clients during this time.

We've set out queue size for search to 300 and we start to see the queue
gets full:
at java.lang.Thread.run(Thread.java:724)
2014-01-07 15:58:55,508 DEBUG action.search.type[Matt Murdock]
[92036651] Failed to execute fetch phase
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException:
rejected execution (queue capacity 300) on
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction$2@23f1bc3
at
org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:61)
at
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)

But also we see the following error which we've been unable to find the
diagnosis for:
2014-01-07 15:58:55,530 DEBUG index.shard.service [Matt Murdock]
[index-name][4] Can not build 'doc stats' from engine shard state
[RECOVERING]
org.elasticsearch.index.shard.IllegalIndexShardStateException:
[index-name][4] CurrentState[RECOVERING] operations only allowed when
started/relocated
at
org.elasticsearch.index.shard.service.InternalIndexShard.readAllowed(InternalIndexShard.java:765)

Are we doing anything wrong or has anyone experienced this?

Thanks,
Jenny

Re: More like this scoring algorithm unclear

2014-01-08 Thread Maarten Roosendaal

Hi,

Thanks, i'm not quite sure how to do that. I'm using:
http://localhost:9200/lists/list/[id of
list]/_mlt?mlt_field=product_idmin_term_freq=1min_doc_freq=1

the body does not seem to be respected (i'm using the elasticsearch head
plugin) if i ad:
{
explain: true
}

i've been trying to rewrite the mlt api as an mlt query but no luck so far.
Any suggestions?

Thanks,
Maarten

Op woensdag 8 januari 2014 16:14:25 UTC+1 schreef Justin Treher:

Hey Maarten,

http://lucene.apache.org/core/3_0_3/api/contrib-queries/org/apache/lucene/search/similar/MoreLikeThis.html

Justin

On Wednesday, January 8, 2014 3:04:47 AM UTC-5, Maarten Roosendaal wrote:

Hi,

I have a question about why the 'more like this' algorithm scores
documents higher than others, while they are (at first glance) the same.

Hope someone can give some pointers.

Thanks,
Maarten

Timestamp and _timestamp

2014-01-08 Thread Николай Колев

Hi all,
I have a ES cluster with four nodes and 157 indexes. There are about 140 
mln. entries that occupy around 50 GB size (1 primary index with one 
replica). There are 2 data nodes, one pure master and one client node that 
serve as gate for web requests.
Last days I started to observe that the cluster becomes very unstable and 
every few hours one of the data server stop unexpectedly. The only solution 
was to reboot all data nodes to be able to process future logging.
My mapping contains this definitions:
Timestamp: {
type: date,
format: date_time
}
and 
_timestamp : { enabled : true, path : Timestamp  },

After some tests I discovered that if I do request with filtering on 
Timestamp the CPU load becomes very high and the cluster gets unstable. All 
incoming events are rejected.
While when I make requests filtering on _timestamp everything works well as 
expected.

My question is: why this is happening and what is the source of this 
behavior?
Any ideas how to fix it?

Thanks in advance,
Nickolay Kolev

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2a4cfa7b-fdfe-45a4-8bd2-906d8587a4f4%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Is it possible to do a text query against a (pre)defined set of fields?

2014-01-08 Thread InquiringMind

Ville,

Perhaps: Don't include the private fields in _all. Then a query against 
_all would be restricted to the (perhaps hundreds) of public fields.

A query that includes the private fields would need to list _all and then 
the private fields. But since you have only 2 or 3 private fields, there 
shouldn't be much overhead on the query.

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0692b19c-2c81-441b-ba2c-c7d33f98648c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Can wildcard, matched fields have relevance scoring?

2014-01-08 Thread project2501

Hi,
  I am doing an 'exists' query on a field that is matching to a text field. 
 The results all come back with same score.
Example:

_metadata:[* TO *]  // Match documents where this field exists

matched field ['text']

This searches only documents that contain a field called _metadata and 
highlights that field into 'text' field. 

I want the results to be ranked based on size of _metadata field or # of 
matches.

Is it possible?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e203d671-cca4-4d19-ba63-bc6d3c704026%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Is it possible to do a text query against a (pre)defined set of fields?

2014-01-08 Thread Ville Mattila

Hi,

Well - I think I've understood something wrong here. Isn't _all a special 
key that includes all indexed fields? Is there possibility to change the 
fields included in _all?

Ville


Ville,

 Perhaps: Don't include the private fields in _all. Then a query against 
 _all would be restricted to the (perhaps hundreds) of public fields.

 A query that includes the private fields would need to list _all and then 
 the private fields. But since you have only 2 or 3 private fields, there 
 shouldn't be much overhead on the query.

 Brian


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d326a968-9ed9-44e9-8640-9b00d9507c05%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Does the server support streaming?

You are correct, ES nodes consumes data request by request, before they are
passed on through the cluster. Also the bulk indexing requests, such
requests are temporarily pushed to buffers, but they are split by lines and
executed as single actions.

So to reduce network roundtrips, the best thing is to use the bulk API.
What is left is a few percent to optimize, which is not much worth it. With
gzip, ES HTTP provides transparent compression. Main challenge is HTTP
overhead (headers can't be compressed), and base64, if you use binary data
with ES.

Please note that you must evaluate the bulk responses too, in order to
validate the notification about bulk success on doc level.

It is possible to extend the whole ES API also to Websocket, so beside
JSON, it could also be possible to transfer JSON text frames or
SMILE/binary frames on a single bi-directional channel. HTTP must use two
channels for this, so with Websocket, you can reduce connection resources
to the half. In this sense, the Netty channel / REST / Java API could be
extended for special realtime WS streaming mode applications, like for
pubsub applications. I experimented with that some time ago on ES 0.20
https://github.com/jprante/elasticsearch-transport-websocket (needs
updating)

From what I understand, the thrift transport plugin compiles the ES API,
operates in a streaming-like fashion, and is providing a solution that
reduces HTTP overhead:
https://github.com/elasticsearch/elasticsearch-transport-thrift

Jörg

Re: More like this scoring algorithm unclear

2014-01-08 Thread Maarten Roosendaal

scoring algorithm is still vague but i got the query to act like the API,
although the results are different so i'm still doing it wrong, here's an
example:
{
explain: true,
query: {
more_like_this: {
fields: [
PRODUCT_ID
],
like_text: 104004855475 1001004002067765 100200494210
1002004004499883,
min_term_freq: 1,
min_doc_freq: 1,
max_query_terms: 1,
percent_terms_to_match: 0.5
}
},
from: 0,
size: 50,
sort: [],
facets: {}
}

the like_text contains product_id's from a wishlist for which i want to
find similair lists

Op woensdag 8 januari 2014 16:50:53 UTC+1 schreef Maarten Roosendaal:

Hi,

Thanks, i'm not quite sure how to do that. I'm using:
http://localhost:9200/lists/list/[id of
list]/_mlt?mlt_field=product_idmin_term_freq=1min_doc_freq=1

the body does not seem to be respected (i'm using the elasticsearch head
plugin) if i ad:
{
explain: true
}

i've been trying to rewrite the mlt api as an mlt query but no luck so
far. Any suggestions?

Thanks,
Maarten

Op woensdag 8 januari 2014 16:14:25 UTC+1 schreef Justin Treher:

Hey Maarten,

http://lucene.apache.org/core/3_0_3/api/contrib-queries/org/apache/lucene/search/similar/MoreLikeThis.html

Justin

On Wednesday, January 8, 2014 3:04:47 AM UTC-5, Maarten Roosendaal wrote:

Hi,

I have a question about why the 'more like this' algorithm scores
documents higher than others, while they are (at first glance) the same.

Hope someone can give some pointers.

Thanks,
Maarten

Elasticsearch Missing Data

2014-01-08 Thread Eric Luellen

Hello,

I've had my elasticsearch instance running for about a week with no issues,
but last night it stopped working. When I went to look in Kibana, it stops
logging around 20:45 on 1/7/14. I then restarted the service on both both
elasticsearch servers and it started logging again and back pulled some
logs from 07:10 that morning, even though I restarted the service around
10:00. So my questions are:

1. Why did it stop working? I don't see any obvious errors.
2. When I restarted it, why didn't it go back and pull all of the data and
not just some of it? I see that there are no unassigned shards.

curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
cluster_name : my-elasticsearch,
status : green,
timed_out : false,
number_of_nodes : 3,
number_of_data_nodes : 2,
active_primary_shards : 40,
active_shards : 80,
relocating_shards : 0,
initializing_shards : 0,
unassigned_shards : 0

Are there any additional queries or logs I can look at to see what is going
on?

On a slight side note, when I restarted my 2nd elasticsearch server it
isn't reading from the /etc/elasticsearch.yml file like it should. It isn't
creating the node name correctly or putting the data files in the spot I
have configured. I'm using CentOS and doing everything via
/etc/init.d/elasticsearch on both servers and the elasticsearch1 server
reads everything correctly but elasticsearch2 does not.

Thanks for your help.
Eric

Re: Strategy for keeping Elasticsearch updated with MySQL

I would do 1/ to have a more near real time search.
Also, I'd the idea that I have an object in memory and I simply push it to
MySQL and to ES in the same time. No need to read again the object from MySQL
to index it in another process (proposition 2)

That said you could use also a Message Queue in the middle if you want to be
able at some point to stop your ES cluster without stopping your application.
This is what I did in the past.

My 2 cents

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 8 janvier 2014 at 20:13:40, arthurX (fc28...@gmail.com) a écrit:

Hello! I use MySQL as my primary datastore and use Elasticsearch to further
index the documents.
My problem is keeping the data in ES in sync with MySQL.

Currently I have two methods in mind:
1. whenever add or update an entry in MySQL, do the action together in ES.
2. Do some cron jobs that periodically keep ES in sync with the data in MySQL.

For method 2 I wonder how can I check if an entry is already indexed in
Elasticsearch. And would it be efficient at all if I have to check every entry
to see if it is updated?

I am new to the technology and I am afraid I had missed some really obvious and
established solutions here. Or otherwise the normal way this situation is
handled?
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/55d842e5-277f-4d24-b5a9-8be5b5544dbc%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Very open Elasticsearch installation

2014-01-08 Thread Nikolas Everett

I've spent the past six months or so writing and deploying a replacement on
site search system and we've finally decided it was time to blog about
ithttp://blog.wikimedia.org/2014/01/06/wikimedia-moving-to-elasticsearch/.
I figure this list might find this useful because
everythinghttp://git.wikimedia.org/summary/?r=mediawiki/extensions/CirrusSearch.gitis
open
sourcehttp://git.wikimedia.org/tree/operations%2Fpuppet.git/production/modules%2Felasticsearchand
publichttp://ganglia.wikimedia.org/latest/?c=Elasticsearch%20cluster%20eqiadm=cpu_reportr=hours=by%20namehc=4mc=2.
It looks like we're averaging about
3000http://ganglia.wikimedia.org/latest/?r=daycs=ce=m=es_queriess=by+namec=Elasticsearch+cluster+eqiadh=host_regex=max_graphs=0tab=mvn=hide-hf=falsesh=1z=smallhc=4queries
per second and about
300http://ganglia.wikimedia.org/latest/?r=daycs=ce=m=es_indexess=by+namec=Elasticsearch+cluster+eqiadh=host_regex=max_graphs=0tab=mvn=hide-hf=falsesh=1z=smallhc=4updates
per second at the moment which doesn't make us a very big
installation but we're excited in our own little way.  If you have time
please give it a shot here https://it.wikipedia.org/wiki/Speciale:Ricercaor
here https://en.wikisource.org/wiki/Special:Search or
herehttps://www.mediawiki.org/wiki/Special:Searchor, if you don't
mind somewhat uglier results,
here https://www.wikidata.org/wiki/Special:Search.  If you notice
anything fishy please let me know or file a
bughttps://bugzilla.wikimedia.org/enter_bug.cgi?product=MediaWiki%20extensionscomponent=CirrusSearch.
We're a pretty small team but we'll get to everything eventually.

Wish me/us luck.  Over the next few months we'll be doubling the number of
documents indexed and doubling the update rate and ramping up the query
rate by about an order of magnitude.

Thanks for reading,

Nik

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1RM7%2BZdJbUq%2B%2BAnTiz_4Hu1ei9OQ7c31w9%3Dgp248i-6A%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Very open Elasticsearch installation

This is really awesome Nik!
Congrats to your team.

I'm a bit disappointed that this search gives no result: 
https://en.wikisource.org/w/index.php?title=Special%3ASearchprofile=defaultsearch=elasticsearchfulltext=Search
 :-)

Best

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 8 janvier 2014 at 21:46:20, Nikolas Everett (nik9...@gmail.com) a écrit:

I've spent the past six months or so writing and deploying a replacement on 
site search system and we've finally decided it was time to blog about it.  I 
figure this list might find this useful because everything is open source and 
public.  It looks like we're averaging about 3000 queries per second and about 
300 updates per second at the moment which doesn't make us a very big 
installation but we're excited in our own little way.  If you have time please 
give it a shot here or here or here or, if you don't mind somewhat uglier 
results, here.  If you notice anything fishy please let me know or  file a bug. 
 We're a pretty small team but we'll get to everything eventually.

Wish me/us luck.  Over the next few months we'll be doubling the number of 
documents indexed and doubling the update rate and ramping up the query rate by 
about an order of magnitude.

Thanks for reading,

Nik
--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1RM7%2BZdJbUq%2B%2BAnTiz_4Hu1ei9OQ7c31w9%3Dgp248i-6A%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.52cdb9ed.4b588f54.1449b%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Strategy for keeping Elasticsearch updated with MySQL

2014-01-08 Thread Николай Колев

Hi Arthur,

I have done something similar years ago when I was working for a newspaper.
We kept articles in database and full text was done with external program.
There was a trigger the tables with articles that on every change operation
adds record in a queue table. Something like this:
article_id, opetation_type, table_name
Then there was a cron jon every minute that reads from this table and
-On delete deletes the entry
-On Update deletes the entry and generates new simple page with the new
artice – only title and content and put it on indexer to be indexed
-On insert generates new simple page with the new artice – only title
and content and put it on indexer to be indexed

Articles have been placed in some directory like this:
/root_dir/table_name/id/content.html. Then this path was returned
and easy parsed to generate appropriate link to the artice.

After success removes respective record from the queue and we have near
realtime

This can be done with ES but much easier

best reragards,
Nickolay Kolev

08 януари 2014, сряда, 21:13:35 UTC+2, arthurX написа:

Hello! I use MySQL as my primary datastore and use Elasticsearch to
further index the documents.
My problem is keeping the data in ES in sync with MySQL.

Currently I have two methods in mind:
1. whenever add or update an entry in MySQL, do the action together in ES.
2. Do some cron jobs that periodically keep ES in sync with the data in
MySQL.

For method 2 I wonder how can I check if an entry is already indexed in
Elasticsearch. And would it be efficient at all if I have to check every
entry to see if it is updated?

I am new to the technology and I am afraid I had missed some really
obvious and established solutions here. Or otherwise the normal way this
situation is handled?

Re: Very open Elasticsearch installation

2014-01-08 Thread Nikolas Everett

On Wed, Jan 8, 2014 at 3:49 PM, David Pilato da...@pilato.fr wrote:

 This is really awesome Nik!
 Congrats to your team.

 I'm a bit disappointed that this search gives no result:
 https://en.wikisource.org/w/index.php?title=Special%3ASearchprofile=defaultsearch=elasticsearchfulltext=Search
  :-)


Looks like we don't have any books about Elasticsearch.  It does show up
here:
https://it.wikipedia.org/w/index.php?title=Speciale%3ARicercaprofile=defaultsearch=elasticsearchfulltext=Searchbut
I can't read it.  You can also find technical stuff about the
integration and our rollout plan over here:
https://www.mediawiki.org/w/index.php?title=Special%3ASearchprofile=defaultsearch=elasticsearchfulltext=Search.

I'll let you know when you can find
https://en.wikipedia.org/wiki/Elasticsearch with it but that might take
some time.  There is way too much search traffic for us to be the default
there.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1Eaxe0dYW-WjyYbhVssaambHWdYqKeyMFstah3apdDrw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Is it possible to do a text query against a (pre)defined set of fields?

2014-01-08 Thread InquiringMind

Ville,

By default, the _all field includes all of the indexed field. Then, for 
your private fields, explicitly exclude them from the _all field by adding 
the following to their properties:

include_in_all : false


See the ES guide for more details, Specifically, 
thishttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-all-field.htmlmight
 help.

I typically disable the _all field completely to cut down dramatically on 
disk space and build times. But everything else in ES has worked like a 
charm, so I'm sure this would work for you without too much trouble. Good 
luck!

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/54d7375c-2644-4b81-8e0e-51e74140f0aa%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: incrementally scaling ES from the small data

2014-01-08 Thread InquiringMind

Adolfo,

Still could not test how sockets relate to shards and why I automatically 
 get 10 established sockets when opening a client:

 node = builder.client(clientOnly).data(!clientOnly).local(local).node();

 client = node.client();


 on default ES configuration, and many many more sockets after (up to 200), 
 and how this number changes when increasing/decreasing number of shards, 


Of course, your application should create only one client and then let all 
threads within the application share that one client. Each client, 
especially the NodeClient, typically creates a thread pool behind it. It's 
a very heavy-weight object, so do not create more than one of them. But 
it's perfectly thread-safe and can (should) be used by as many threads in 
your application as desired.

Brian 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/653b907c-38fe-4cfd-9cb9-1e7dcfae9c00%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Kibana Static Dashboard ?

2014-01-08 Thread Jay Wilson

 I am modifying the guided.json dashboard. Down in Events panel I would 
like to tell kibana to statically filter out specific records. I tried 
adding this to the file.

  query: {
  filtered: {
  query: {
bool: {
  should: [
  {
  query_string: {
   query: record-type: traffic-stats
  }
  }
]
 }
 }
  }
  },

Doesn't appear to work.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/83bd80b2-5a61-4a15-b359-125fd600f3cd%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Very open Elasticsearch installation

I can't tell it in other words, but your step to ES is a landmark.

Thank you, Nik, for making this public, this helps me a lot for spreading
the word for more openness...

https://en.wikisource.org/w/index.php?title=Special%3ASearchprofile=defaultsearch=Hello+Worldfulltext=Search

The search suggestion is a bit surprising - he do works :) but what a
difference to the old search https://de.wikisource.org/wiki/Spezial:Suche

Jörg

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHKtgyS%2BKPuDNnhXbx6nc038O61TGze-mC6pVvTANMkGA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Kibana Static Dashboard ?

2014-01-08 Thread vineeth mohan

Hello Jay ,

Cant you do the same from the kibana side by adding a must not filter.
Here once you save that dashboard , you can always go back to the same link
to see the same static dashboard.

Thanks
Vineeth

On Thu, Jan 9, 2014 at 2:42 AM, Jay Wilson jawro...@gmail.com wrote:

I am modifying the guided.json dashboard. Down in Events panel I would
like to tell kibana to statically filter out specific records. I tried
adding this to the file.

query: {
filtered: {
query: {
bool: {
should: [
{
query_string: {
query: record-type: traffic-stats
}
}
]
}
}
}
},

Doesn't appear to work.

Re: Kibana Static Dashboard ?

2014-01-08 Thread Jay Wilson

As I understand Kibana when a dashboard is saved, it is placed into 
elasticsearch. I don't want it in elasticsearch. I want it in a static file.



On Wednesday, January 8, 2014 2:32:50 PM UTC-7, vineeth mohan wrote:

 Hello Jay , 

 Cant you do the same from the kibana side by adding a must not filter.
 Here once you save that dashboard , you can always go back to the same 
 link to see the same static dashboard.

 Thanks
  Vineeth


 On Thu, Jan 9, 2014 at 2:42 AM, Jay Wilson jawr...@gmail.comjavascript:
  wrote:

  I am modifying the guided.json dashboard. Down in Events panel I would 
 like to tell kibana to statically filter out specific records. I tried 
 adding this to the file.

   query: {
   filtered: {
   query: {
 bool: {
   should: [
   {
   query_string: {
query: record-type: traffic-stats
   }
   }
 ]
  }
  }
   }
   },

 Doesn't appear to work.


  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/83bd80b2-5a61-4a15-b359-125fd600f3cd%40googlegroups.com
 .
 For more options, visit https://groups.google.com/groups/opt_out.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f207abdd-9fce-4379-aa9a-dd1dd35aa398%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

allow_explicit_index and _bulk

2014-01-08 Thread Gabe Gorelick-Feldman

The documentation on URL-based access 
controlhttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/url-access-control.html
 implies 
that _bulk still works if you set rest.action.multi.allow_explicit_index: 
false, as long as you specify the index in the URL. However, I can't get it 
to work.

POST /foo/bar/_bulk
{ index: {} }
{ _id: 1234, baz: foobar }

returns 

explicit index in bulk is not allowed

Should this work?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a0d1fa2f-0c28-4142-9f6d-4b28a1695bb3%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: incrementally scaling ES from the small data

2014-01-08 Thread Ivan Brusic

BTW, I was very wrong when I mentioned that elasticsearch uses consistent
hashing. It uses modulo-based hashing, which is why the number of shards
cannot change since the modulo is fixed. Working on too many things at once
while replying. :)

On Wed, Jan 8, 2014 at 1:10 PM, InquiringMind brian.from...@gmail.comwrote:

Adolfo,

Still could not test how sockets relate to shards and why I automatically
get 10 established sockets when opening a client:

node = builder.client(clientOnly).data(!clientOnly).local(local).node();

client = node.client();

on default ES configuration, and many many more sockets after (up to
200), and how this number changes when increasing/decreasing number of
shards,

Of course, your application should create only one client and then let all
threads within the application share that one client. Each client,
especially the NodeClient, typically creates a thread pool behind it. It's
a very heavy-weight object, so do not create more than one of them. But
it's perfectly thread-safe and can (should) be used by as many threads in
your application as desired.

Brian

For more options, visit https://groups.google.com/groups/opt_out.

Re: No hit using scan/scroll with has_parent filter

2014-01-08 Thread Martijn v Groningen

Hi Jean,

Can you share how you execute the scan request with the has_parent filter?
(via a gist or something like that)

Martijn

On 8 January 2014 15:17, Jean-Baptiste Lièvremont
jean-baptiste.lievrem...@sonarsource.com wrote:

Hi folks,

I use a parent/child mapping configuration which works flawlessly with
classic search requests, e.g using has_parent to find child documents
with criteria on the parent documents.

I am trying to get all child document IDs that match a given set of
criteria using scan and scroll, which also works well - until I introduce
the has_parent filter, in which case the scroll request returns no hit
(although total_hits is correct).

Is it a known issue?

I can provide sample mapping files and queries with associated/expected
results. Please note that this behavior has been noticed on 0.90.6 but is
still present in 0.90.9.

Thanks, best regards,
-- Jean-Baptiste Lièvremont

--
Met vriendelijke groet,

Martijn van Groningen

SSL and org.elasticsearch.transport.NodeDisconnectedException

2014-01-08 Thread Maciej Stoszko



I have an es_client (java/dropwizard) application. It communicates with the 
elasticsearch just fine over plaintext connection. 

I have followed the instructions at 
https://github.com/sonian/elasticsearch-jetty to set up SSL for es. 

However when I start my es_client it reports every 5 seconds the following:

INFO [2014-01-08 23:02:14,814] org.elasticsearch.client.transport: 
[Karolina Dean] failed to get node info for 
[#transport#-1][inet[localhost/127.0.0.1:9443]], disconnecting... ! 
org.elasticsearch.transport.NodeDisconnectedException: 
[][inet[localhost/127.0.0.1:9443]][cluster/nodes/info] disconnected

How can I go about figuring this one out?

Thanks, 

Maciej

 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bff6a325-ab42-48ef-a6f4-bc7f9c274d1a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: SSL and org.elasticsearch.transport.NodeDisconnectedException

2014-01-08 Thread Maciej Stoszko



On Wednesday, January 8, 2014 5:19:10 PM UTC-6, Maciej Stoszko wrote:

 I have an es_client (java/dropwizard) application. It communicates with 
 the elasticsearch just fine over plaintext connection. 

 I have followed the instructions at 
 https://github.com/sonian/elasticsearch-jetty to set up SSL for es. 

 However when I start my es_client it reports every 5 seconds the following:

 INFO [2014-01-08 23:02:14,814] org.elasticsearch.client.transport: 
 [Karolina Dean] failed to get node info for 
 [#transport#-1][inet[localhost/127.0.0.1:9443]], disconnecting... ! 
 org.elasticsearch.transport.NodeDisconnectedException: 
 [][inet[localhost/127.0.0.1:9443]][cluster/nodes/info] disconnected

 How can I go about figuring this one out?

 Thanks, 

 Maciej

 Actually digging around a bit more, I think I should revise my question:
Is it currently possible to have JAVA API client talking to Elasticsearch 
via SSL.
I see that https://github.com/elasticsearch/elasticsearch/pull/2105 (Add 
SSL support to Netty transport layer for Client/Node-to-Node communication) 
was rejected.
Maybe it is simply a feature which does not (yet) exist. 

  


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3933c63d-9763-4892-977b-733c0407e140%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: How to index an existing json file

2014-01-08 Thread ZenMaster80

Thank you for the binary flag tip. It is also in the documentation here:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html

On Tuesday, January 7, 2014 9:00:33 PM UTC-5, ZenMaster80 wrote:

 Hi,

 I am just starting with ElasticSearch, I would like to know how to index a 
 simple json document books.json that has the following in it: Where do I 
 place the document? I placed it in root directory of elastic search and in 
 /bin folder..

 {“books”:[{“name”:”life in heaven”,”author”:”Mike Smith”},{“name”:”get 
 rich”,”author”:”Joe Shmoe”},{“name”:”luxury properties”,”author”:”Linda 
 Jones”]}}


 $ curl -XPUT http://localhost:9200/books/book/1; -d @books.json

 Warning: Couldn't read data from file books.json, this makes an empty 
 POST.

 {error:MapperParsingException[failed to parse, document is 
 empty],status:400}


 Thanks


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/853a876f-c6cb-4dd5-907a-13f626b3f078%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Filter and Query same taking some time

2014-01-08 Thread Arjit Gupta

Hi, 

I had implemented ES search query  for all our use cases but when i came to 
know that some of our use cases can be solved by filters I implemented that 
but I dont see any gain (in response time) in filters. My search queries 
 are 

1. Filter 

{
  size : 100,
  query : {
match_all : { }
  },
  filter : {
bool : {
  must : {
term : {
  color : red
}
  }
}
  },
  version : true
}


2. Query 

{
  size : 100,
  query : {
bool : {
  must : {
match : {
  color : {
query : red,
type : boolean,
operator : AND
  }
}
  }
}
  },
  version : true
}

By default the term query should be cached but I dont see a performance 
gain. 
Do i need to change some parameter also  ?
I am using ES  0.90.1 and with 16Gb of heap space given to ES. 

Thanks,
Arjit

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/326a6640-d887-46b4-a8e7-ec15a1c9dc98%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Why not use rivers in production?

2014-01-08 Thread Warner Onstine

Getting reintroduced to ES and a co-worker recommended I listen to the
webinar intro that Drew Raines gave as he mentioned something specific
about rivers.

Listening through it I heard him say that they (assuming ES) don't
recommend using rivers in production because it's tied to one node.

Having looked through a lot of the documentation on rivers I do see that
you can specify which rivers run on which nodes so I wasn't sure what the
exact implication was of this statement.

Drew? Or anyone else care to comment?

We're getting ready to push all of our data from MongoDB into ES so that we
can search it and use Kibana for analysis so any insight into this would be
great, thank you :).

-warner

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAJNTuMAay4fSD%3DaE453Ps2pYFRXcYjeM2TUb718z9na8pYqdBg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: High load average running on ES node

2014-01-08 Thread Arjit Gupta

Hi Jörg,

Thanks a lot for your detailed reply.
Can you please explain how can I *reconfiguring ES for efficient cache
usage* ?

Thanks,
Arjit

Thanks ,
Arjit

On Sun, Jan 5, 2014 at 10:55 PM, joergpra...@gmail.com
joergpra...@gmail.com wrote:

The load is not much a surprise for an 8 core CPU node, I have also
observed loads of 80-100.

This high load, when induced by indexing, can be significantly reduced
when using a high performance input/output disk subsystem, such as SSD. The
disks are the slowest part in the system and generate high I/O wait which
is responsible for increasing the CPU load.

GC does generate high load too, this is mostly related to expensive
queries that use filters or caches. The overall performance of the JVM is
getting very poor in that case.

You have several options:
- rewriting queries or reconfiguring ES for efficient cache usage
- adding nodes
- decrease the heap slightly to smooth the steep edge when stop-the-world
GC kicks in (but this depends on the workload if your ES cluster can work
with less heap)

G1 GC does not help against query/filter load, it is not decreasing CPU
load, in fact, it is putting more CPU load on the machines, so it can
better make a trade-off with less stop-of-the-world. G1 GC helps to push
the stop-the-world periods under a certain limit so ES nodes do not
disconnect that easily. It has no steep edge when performing stop-the-world
GC phases.

Please note, currently G1 GC seems safe only with Java 7 or Java 8 and ES
version that have replaced GNU trove4j with HPPC library, that is, 0.90.9
or 1.0.0.Beta2

Jörg

--
You received this message because you are subscribed to a topic in the
Google Groups elasticsearch group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/taLTdd4S29w/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFnK4SweOfQK1b0yg9M4yCBJVVmGpd%3DcJpHSwqLC9Cjxw%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks

 The env is following:
 --elasticseasrch  v0.90(  i use 0.90.9 , the problem is still exist).
 -- java version is 1.7.0_45

On Wednesday, January 8, 2014 6:58:02 PM UTC+8, xjj2...@gmail.com wrote:

 Dear all:
I insert 1 logs to elasticsearch, each log is about 2M, and 
 there are about 3000 keys and values.
  when i insert about 2, it used about 30G memory, and then 
 elasticsearch is very slow, and it's hard to insert log.
  Could someone help me how to solve it? Thanks very much.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/caec9b84-c543-4bb3-8cb0-e90113972716%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Pls help me: i insert log to elasticsearch, but it use too much memory, how to solve it?thanks

Just wondering if you are hitting the same RAM usage when inserting without
thrift?
Could you test it?

Could you gist as well what gives:

curl -XGET 'http://localhost:9200/_nodes?all=truepretty=true'

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 9 janvier 2014 at 07:11:33, xjj210...@gmail.com (xjj210...@gmail.com) a
écrit:

The env is following:
--elasticseasrch v0.90( i use 0.90.9 , the problem is still exist).
-- java version is 1.7.0_45

On Wednesday, January 8, 2014 6:58:02 PM UTC+8, xjj2...@gmail.com wrote:
Dear all:
I insert 1 logs to elasticsearch, each log is about 2M, and there
are about 3000 keys and values.
when i insert about 2, it used about 30G memory, and then elasticsearch is
very slow, and it's hard to insert log.
Could someone help me how to solve it? Thanks very much.
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/caec9b84-c543-4bb3-8cb0-e90113972716%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Why not use rivers in production?

A river instance is a singleton in the cluster.
It means that a river is working only on a single node.

It could be reallocated on another node when the first node fails.

I think that's what Drew meant. Basically, rivers does not scale.

My 2 cents

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 9 janvier 2014 at 06:15:24, Warner Onstine (warn...@gmail.com) a écrit:

Getting reintroduced to ES and a co-worker recommended I listen to the webinar
intro that Drew Raines gave as he mentioned something specific about rivers.

Listening through it I heard him say that they (assuming ES) don't recommend
using rivers in production because it's tied to one node.

Having looked through a lot of the documentation on rivers I do see that you
can specify which rivers run on which nodes so I wasn't sure what the exact
implication was of this statement.

Drew? Or anyone else care to comment?

We're getting ready to push all of our data from MongoDB into ES so that we can
search it and use Kibana for analysis so any insight into this would be great,
thank you :).

-warner
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAJNTuMAay4fSD%3DaE453Ps2pYFRXcYjeM2TUb718z9na8pYqdBg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Filter and Query same taking some time

You probably won't see any difference the first time you execute it unless you
are using warmers.
With a second query, you should see the difference.

How many documents you have in your dataset?

--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr

Le 9 janvier 2014 at 06:14:06, Arjit Gupta (arjit...@gmail.com) a écrit:

Hi,

I had implemented ES search query for all our use cases but when i came to
know that some of our use cases can be solved by filters I implemented that but
I dont see any gain (in response time) in filters. My search queries are

1. Filter

{
size : 100,
query : {
match_all : { }
},
filter : {
bool : {
must : {
term : {
color : red
}
}
}
},
version : true
}

2. Query

{
size : 100,
query : {
bool : {
must : {
match : {
color : {
query : red,
type : boolean,
operator : AND
}
}
}
}
},
version : true
}

By default the term query should be cached but I dont see a performance gain.
Do i need to change some parameter also ?
I am using ES 0.90.1 and with 16Gb of heap space given to ES.

Thanks,
Arjit
--
You received this message because you are subscribed to the Google Groups
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/326a6640-d887-46b4-a8e7-ec15a1c9dc98%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Filter and Query same taking some time

2014-01-08 Thread Arjit Gupta

I have 100,000 documents which are similar. In response I am getting the
whole document not just Id.
I am executing the query multiple times.

Thanks ,
Arjit

On Thu, Jan 9, 2014 at 1:06 PM, David Pilato da...@pilato.fr wrote:

You probably won't see any difference the first time you execute it unless
you are using warmers.
With a second query, you should see the difference.

How many documents you have in your dataset?

--
*David Pilato* | *Technical Advocate* | *Elasticsearch.com*
@dadoonet https://twitter.com/dadoonet |
@elasticsearchfrhttps://twitter.com/elasticsearchfr

Le 9 janvier 2014 at 06:14:06, Arjit Gupta
(arjit...@gmail.com//arjit...@gmail.com)
a écrit:

Hi,

I had implemented ES search query for all our use cases but when i came
to know that some of our use cases can be solved by filters I implemented
that but I dont see any gain (in response time) in filters. My search
queries are

1. Filter

{
size : 100,
query : {
match_all : { }
},
filter : {
bool : {
must : {
term : {
color : red
}
}
}
},
version : true
}

2. Query

{
size : 100,
query : {
bool : {
must : {
match : {
color : {
query : red,
type : boolean,
operator : AND
}
}
}
}
},
version : true
}

By default the term query should be cached but I dont see a performance
gain.
Do i need to change some parameter also ?
I am using ES 0.90.1 and with 16Gb of heap space given to ES.

To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/326a6640-d887-46b4-a8e7-ec15a1c9dc98%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to a topic in the
Google Groups elasticsearch group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/uknnBHMnZLk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/etPan.52ce519b.75c6c33a.1449b%40MacBook-Air-de-David.local
.

For more options, visit https://groups.google.com/groups/opt_out.

Re: SSL and org.elasticsearch.transport.NodeDisconnectedException

jetty plugin replace http layer (9200) not the transport layer (9300).
Transport Client uses transport layer (9300).


-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 9 janvier 2014 at 02:02:48, Maciej Stoszko (maciek...@gmail.com) a écrit:



On Wednesday, January 8, 2014 5:19:10 PM UTC-6, Maciej Stoszko wrote:
I have an es_client (java/dropwizard) application. It communicates with the 
elasticsearch just fine over plaintext connection. 

I have followed the instructions at 
https://github.com/sonian/elasticsearch-jetty to set up SSL for es. 

However when I start my es_client it reports every 5 seconds the following:

INFO [2014-01-08 23:02:14,814] org.elasticsearch.client.transport: [Karolina 
Dean] failed to get node info for 
[#transport#-1][inet[localhost/127.0.0.1:9443]], disconnecting... ! 
org.elasticsearch.transport.NodeDisconnectedException: 
[][inet[localhost/127.0.0.1:9443]][cluster/nodes/info] disconnected

How can I go about figuring this one out?

Thanks, 

Maciej


Actually digging around a bit more, I think I should revise my question:
Is it currently possible to have JAVA API client talking to Elasticsearch via 
SSL.
I see that https://github.com/elasticsearch/elasticsearch/pull/2105 (Add SSL 
support to Netty transport layer for Client/Node-to-Node communication) was 
rejected.
Maybe it is simply a feature which does not (yet) exist. 
 
--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3933c63d-9763-4892-977b-733c0407e140%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.52ce5336.374a3fe6.1449b%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/groups/opt_out.

Elasticsearch Hadoop

2014-01-08 Thread Badal Mohapatra

Hi,

   To index Hadoop data into elasticsearch as I understand,
We create an external table with essstorage handler and then copy the data 
from another internal hive table doesn't it duplicate the data in HDFS?
Is there any way to use the hive internal tables directly to index instead 
of having two tables with same data?

Kind Regards,
Badal

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ed08fd38-05e4-437a-a8e2-3295f2195e2a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Converting queries returning certain distinct records to ES