Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-13 Thread Mark Walkom
It's not surprising that the time increases when you have an order of
magnitude more fields.

Are you using the bulk API?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 13 June 2014 15:57, Maco Ma mayaohu...@gmail.com wrote:

 I try to measure the performance of ingesting the documents having lots of
 fields.


 The latest elasticsearch 1.2.1:
 Total docs count: 10k (a small set definitely)
 ES_HEAP_SIZE: 48G
 settings:

 {doc:{settings:{index:{uuid:LiWHzE5uQrinYW1wW4E3nA,number_of_replicas:0,translog:{disable_flush:true},number_of_shards:5,refresh_interval:-1,version:{created:1020199}

 mappings:

 {doc:{mappings:{type:{dynamic_templates:[{t1:{mapping:{store:false,norms:{enabled:false},type:string},match:*_ss}},{t2:{mapping:{store:false,type:date},match:*_dt}},{t3:{mapping:{store:false,type:integer},match:*_i}}],_source:{enabled:false},properties:{}

 All fields in the documents mach the templates in the mappings.

 Since I disabled the flush  refresh, I submitted the flush command (along
 with optimize command after it) in the client program every 10 seconds. (I
 tried the another interval 10mins and got the similar results)

 Scenario 0 - 10k docs have 1000 different fields:
 Ingestion took 12 secs.  Only 1.08G heap mem is used(only states the used
 heap memory).


 Scenario 1 - 10k docs have 10k different fields(10 times fields compared
 with scenario0):
 This time ingestion took 29 secs.   Only 5.74G heap mem is used.

 Not sure why the performance degrades sharply.

 If I try to ingest the docs having 100k different fields, it will take 17
 mins 44 secs.  We only have 10k docs totally and not sure why ES perform so
 badly.

 Anyone can give suggestion to improve the performance?







  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/25ec100b-96d8-434b-b3a0-3a3e8ad90de4%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/25ec100b-96d8-434b-b3a0-3a3e8ad90de4%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624bVPUUUAWJAaeLKwTrzSjprtdbFpp_SkBPHRkLxOdUaHg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: index template updating problem

2014-06-13 Thread Ivan Brusic
The index template will only be applied when a new index is created.

-- 
Ivan


On Thu, Jun 12, 2014 at 5:54 AM, sri 1.fr@gmail.com wrote:

 Hello all,

 If i update the mapping in an existing index template the change is not
 reflected automatically, i have to manually delete the old mapping and then
 apply the template again.

 So my question is, is ES configured to run this way or should the mapping
 changes be updated automatically.

 Thanks and Regards
 Sri

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/aac8049b-91ff-4fdb-a12a-5bc54a5afde3%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/aac8049b-91ff-4fdb-a12a-5bc54a5afde3%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDUSaOKfXD040v-427B6bjeoK3gh16m8iw%3DEx8MdWTysA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-13 Thread Maco Ma
I used the curl command to do the ingestion(one command, one doc) and 
flush. I also tried the Solr(disabled the soft/hard commit  do the commit 
with client program) with the same data  commands and its performance did 
not degrade. Lucene are used for both of them and not sure why there is a 
big difference with the performances. 

On Friday, June 13, 2014 2:02:58 PM UTC+8, Mark Walkom wrote:

 It's not surprising that the time increases when you have an order of 
 magnitude more fields.

 Are you using the bulk API?

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com javascript:
 web: www.campaignmonitor.com
  

 On 13 June 2014 15:57, Maco Ma mayao...@gmail.com javascript: wrote:

 I try to measure the performance of ingesting the documents having lots 
 of fields.


 The latest elasticsearch 1.2.1:
 Total docs count: 10k (a small set definitely)
 ES_HEAP_SIZE: 48G
 settings:

 {doc:{settings:{index:{uuid:LiWHzE5uQrinYW1wW4E3nA,number_of_replicas:0,translog:{disable_flush:true},number_of_shards:5,refresh_interval:-1,version:{created:1020199}

 mappings:

 {doc:{mappings:{type:{dynamic_templates:[{t1:{mapping:{store:false,norms:{enabled:false},type:string},match:*_ss}},{t2:{mapping:{store:false,type:date},match:*_dt}},{t3:{mapping:{store:false,type:integer},match:*_i}}],_source:{enabled:false},properties:{}

 All fields in the documents mach the templates in the mappings.

 Since I disabled the flush  refresh, I submitted the flush command 
 (along with optimize command after it) in the client program every 10 
 seconds. (I tried the another interval 10mins and got the similar results)

 Scenario 0 - 10k docs have 1000 different fields:
 Ingestion took 12 secs.  Only 1.08G heap mem is used(only states the used 
 heap memory).


 Scenario 1 - 10k docs have 10k different fields(10 times fields compared 
 with scenario0):
 This time ingestion took 29 secs.   Only 5.74G heap mem is used.

 Not sure why the performance degrades sharply.

 If I try to ingest the docs having 100k different fields, it will take 17 
 mins 44 secs.  We only have 10k docs totally and not sure why ES perform so 
 badly. 

 Anyone can give suggestion to improve the performance?







  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/25ec100b-96d8-434b-b3a0-3a3e8ad90de4%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/25ec100b-96d8-434b-b3a0-3a3e8ad90de4%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8694a4da-68f6-40b3-9d40-fbbc63041cad%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elastic Search and consistency

2014-06-13 Thread shikhar
On Thu, Jun 12, 2014 at 8:52 PM, shikhar shik...@schmizz.net wrote:

 ES currently does not seem to provide any guarantee that an acknowledged
 write (from the caller's perspective) succeeded on a quorum of replicas.


I take this back, I understand the ES model better now. So although the
write-consistency-level check is only applied before the write is about to
be issued, with sync replication the client can only get an ack if it
succeded on the primary shard as well as all replicas (as per the same
cluster state as the check is performed on). In case it fails on some
replica(s), the operation would be retried (together with the
write-consistency-level check using a possibly-updated cluster state).


 This makes it unsuitable for a primary data store, given you can see data
 loss despite having replicas!


If using ES as a primary store, you should really be running it with
*  index.gateway.local.sync: 0*
to make sure the translog fsync's on every write operation



A follow-up question: what if there is a failure on one of the replicas
that prevents writes (e.g. disk full) but this is not preventing the node
from dropping out of discovery state due to being healthy otherwise? Does
it not make that node a SPOF? This is something we have run into with
SolrCloud https://issues.apache.org/jira/browse/SOLR-5805.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHWG4DOi0sTu5_mwv%3DNJL5SH7%3D1Z5CG2iULSbF0_P7ZDULY-qw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Need help, multiple aggregations with filters extremely slow, where to look for optimizations?

2014-06-13 Thread Thomas
Hi,

I'm facing a performance issue with some aggregations I perform, and I need 
your help if possible:

I have to documents, the *request* and the *event*. The request is the 
parent of the event. Below is a (sample) mapping

event : {
dynamic : strict,
_parent : {
   type : request
},
properties : {
   event_time : {
format : dateOptionalTime,
type : date
   },
   count : {
  type : integer
},
event : {
index : not_analyzed,
type : string
}
 }
}

request : {
dynamic : strict,
 _id : {
   path : uniqueId
 },
 properties : {
uniqueId : {
 index : not_analyzed,
 type : string
},
user : {
 index : not_analyzed,
 type : string
},
   code : {
  type : integer
   },
   country : {
 index : not_analyzed,
 type : string
   },
   city : {
 index : not_analyzed,
 type : string
   }
  
}
}

My cluster is becoming really big (almost 2 TB of data with billions of 
documents) and i maintain one index per day, whereas I occasionally delete 
old indices. My daily index is about 20GB big. The version of elasticsearch 
that I use is 1.1.1. 

My problems start when I want to get some aggregations of events with some 
criteria which is applied in the parent request document. For example count 
be the events of type *click for country = US and code=12. What I was 
initially doing was to generate a scriptFilter for the request document (in 
Groovy) and I was adding multiple aggregations in one search request. This 
ended up being very slow so I removed the scripting logic and I supported 
my logic with java code.*

What seems to be initially solved in my local machine, when I got back to 
the cluster, nothing has changed. Again my app performs really really poor. 
I get more than 10 seconds to perform a search with ~10 sub-aggregations.

What seems strange is that I notice that the cluster is pretty ok with 
regards load average, CPU etc. 

Any hints on where to look for solving this out? to be able to identify the 
bottleneck

*Ask for any additional information to provide*, I didn't want to make this 
post too long to read
Thank you

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8134f5b0-f947-406f-ab57-c44c6c82ce66%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Need help, multiple aggregations with filters extremely slow, where to look for optimizations?

2014-06-13 Thread Adrien Grand
Can you show us what your request looks like? (including query and aggs)


On Fri, Jun 13, 2014 at 9:09 AM, Thomas thomas.bo...@gmail.com wrote:

 Hi,

 I'm facing a performance issue with some aggregations I perform, and I
 need your help if possible:

 I have to documents, the *request* and the *event*. The request is the
 parent of the event. Below is a (sample) mapping

 event : {
 dynamic : strict,
 _parent : {
type : request
 },
 properties : {
event_time : {
 format : dateOptionalTime,
 type : date
},
count : {
   type : integer
 },
 event : {
 index : not_analyzed,
 type : string
 }
  }
 }

 request : {
 dynamic : strict,
  _id : {
path : uniqueId
  },
  properties : {
 uniqueId : {
  index : not_analyzed,
  type : string
 },
 user : {
  index : not_analyzed,
  type : string
 },
code : {
   type : integer
},
country : {
  index : not_analyzed,
  type : string
},
city : {
  index : not_analyzed,
  type : string
}
   
 }
 }

 My cluster is becoming really big (almost 2 TB of data with billions of
 documents) and i maintain one index per day, whereas I occasionally delete
 old indices. My daily index is about 20GB big. The version of elasticsearch
 that I use is 1.1.1.

 My problems start when I want to get some aggregations of events with some
 criteria which is applied in the parent request document. For example count
 be the events of type *click for country = US and code=12. What I was
 initially doing was to generate a scriptFilter for the request document (in
 Groovy) and I was adding multiple aggregations in one search request. This
 ended up being very slow so I removed the scripting logic and I supported
 my logic with java code.*

 What seems to be initially solved in my local machine, when I got back to
 the cluster, nothing has changed. Again my app performs really really poor.
 I get more than 10 seconds to perform a search with ~10 sub-aggregations.

 What seems strange is that I notice that the cluster is pretty ok with
 regards load average, CPU etc.

 Any hints on where to look for solving this out? to be able to identify
 the bottleneck

 *Ask for any additional information to provide*, I didn't want to make
 this post too long to read
 Thank you

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/8134f5b0-f947-406f-ab57-c44c6c82ce66%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/8134f5b0-f947-406f-ab57-c44c6c82ce66%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.




-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5mt_vb_9kSNGTnkYUZruN_wiuT5K5OpOxJhtq1x%3DEFmQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


How To Disable Recovery Process / Delete Old Shards

2014-06-13 Thread seja12
During a botched upgrade process my data was deleted. As it was a test server
it didn't matter. However, upon reinstall it just constantly tries to
recover old shards, even after deleting every know file on the server that
contains elasticsearch data. Can someone let me know of how to disable the
recovery process and where elasticsearch hides the file it reads to see what
files to recover?

Below is an example from the log file (repeated constantly):

] [Quicksand] [blurays][1] recovery from
[[Stilt-Man][mxmoAlTaTkClmfpImcpb1A][254020-ipaddress]] failed
org.elasticsearch.transport.RemoteTransportException:
[Stilt-Man][inet[/ipaddress]][index/shard/recovery/startRecovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
[blurays][1] Phase[1] Execution failed
at
org.elasticsearch.index.engine.internal.InternalEngine.recover(InternalEngine.java:996)
at
org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:631)
at
org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:122)
at
org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:62)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:351)
at
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:337)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException:
[blurays][1] Failed to transfer [1] files with total size of [71b]
at
org.elasticsearch.indices.recovery.RecoverySource$1.phase1(RecoverySource.java:243)
at
org.elasticsearch.index.engine.internal.InternalEngine.recover(InternalEngine.java:993)
... 9 more
Caused by: java.nio.file.NoSuchFileException:
/home/programs/elasticsearch-1.2.1/data/elasticsearch/nodes/1/indices/blurays/1/index/segments_1
at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at
sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:176)
at java.nio.channels.FileChannel.open(FileChannel.java:287)
at java.nio.channels.FileChannel.open(FileChannel.java:334)
at 
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:193)
at
org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:80)
at org.elasticsearch.index.store.Store.openInputRaw(Store.java:319)
at
org.elasticsearch.indices.recovery.RecoverySource$1$1.run(RecoverySource.java:189)
... 3 more



--
View this message in context: 
http://elasticsearch-users.115913.n3.nabble.com/How-To-Disable-Recovery-Process-Delete-Old-Shards-tp4057556.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1402563728183-4057556.post%40n3.nabble.com.
For more options, visit https://groups.google.com/d/optout.


Re: Need help, multiple aggregations with filters extremely slow, where to look for optimizations?

2014-06-13 Thread Thomas
Below is an example aggregation i perform, is there any optimizations I can 
perform? Maybe disabling some features i do not need etc.

curl -XPOST 
http://localhost:9200/logs-idx.20140613/event/_search?search_type=count; -d
'
{
  aggs: {
f1: {
  filter: {
or: [
  {
and: [
  {
has_parent: {
  type: request,
  filter: {
and: {
  filters: [
{
  term: {
country: US
  }
},
{
  term: {
city: NY
  }
},
{
  term: {
code: 12
  }
}
  ]
}
  }
}
  },
  {
range: {
  event_time: {
gte: 2014-06-13T10:00:00,
lt: 2014-06-13T11:00:00
  }
}
  }
]
  },
  {
and: [
  {
has_parent: {
  type: request,
  filter: {
and: {
  filters: [
{
  term: {
country: US
  }
},
{
  term: {
city: NY
  }
},
{
  term: {
code: 12
  }
},
{
  range: {
request_time: {
  gte: 2014-06-13T10:00:00,
  lt: 2014-06-13T11:00:00
}
  }
}
  ]
}
  }
}
  },
  {
range: {
  event_time: {
lt: 2014-06-13T10:00:00
  }
}
  }
]
  }
]
  },
  aggs: {
per_interval: {
  date_histogram: {
field: event_time,
interval: minute
  },
  aggs: {
metrics: {
  terms: {
field: event,
size: 10
  }
}
  }
}
  }
}
  }
}'


On Friday, 13 June 2014 10:09:46 UTC+3, Thomas wrote:

 Hi,

 I'm facing a performance issue with some aggregations I perform, and I 
 need your help if possible:

 I have to documents, the *request* and the *event*. The request is the 
 parent of the event. Below is a (sample) mapping

 event : {
 dynamic : strict,
 _parent : {
type : request
 },
 properties : {
event_time : {
 format : dateOptionalTime,
 type : date
},
count : {
   type : integer
 },
 event : {
 index : not_analyzed,
 type : string
 }
  }
 }

 request : {
 dynamic : strict,
  _id : {
path : uniqueId
  },
  properties : {
 uniqueId : {
  index : not_analyzed,
  type : string
 },
 user : {
  index : not_analyzed,
  type : string
 },
code : {
   type : integer
},
country : {
  index : not_analyzed,
  type : string
},
city : {
  index : not_analyzed,
  type : string
}
   
 }
 }

 My cluster is becoming really big (almost 2 TB of data with billions of 
 documents) and i maintain one index per day, whereas I occasionally delete 
 old indices. My daily index is about 20GB big. The version of elasticsearch 
 that I use is 1.1.1. 

 My problems start when I want to get some aggregations of events with some 
 criteria which is applied in the parent request document. For example count 
 be the events of type *click for country = US and code=12. What I was 
 initially doing was to generate a scriptFilter for the request document (in 
 Groovy) and I was adding multiple aggregations in one search request. This 
 ended up being very slow so I removed the scripting logic and I supported 
 my logic with java code.*

 What seems to be initially solved in my local machine, when I got back to 
 the cluster, nothing has changed. Again my app performs really really poor. 
 I get more than 10 seconds to perform a search with ~10 sub-aggregations

does document database means denormalize

2014-06-13 Thread eunever32
What I am asking is  

Do different design decisions apply in elasticsearch compared  to relational 

Is denormalized better for elasticsearch

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8f87a55a-c9e8-4198-a5ce-72054ce52958%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Accessing Search Templates via Rest

2014-06-13 Thread Sebastian Gräser
so i guess its not possible?

Am Dienstag, 10. Juni 2014 16:58:31 UTC+2 schrieb Sebastian Gräser:

 Hello,

 maybe someone can help me. Is there a way to get the available search 
 templates via rest api? havent found a way yet, hope you can help me.

 Best regards
 Sebastian


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ae1fedb0-4c74-4407-9532-fe7ad705ceb0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Need help, multiple aggregations with filters extremely slow, where to look for optimizations?

2014-06-13 Thread Adrien Grand
Is this request only about getting aggregations? If so you would probably
get better response times by putting the filter in the query part (under a
filtered query) and only having the date histogram in the aggregation. The
reason is that aggregations are computed on matches, and in case the query
is not specified, that means all documents of your index.


On Fri, Jun 13, 2014 at 9:41 AM, Thomas thomas.bo...@gmail.com wrote:

 Below is an example aggregation i perform, is there any optimizations I
 can perform? Maybe disabling some features i do not need etc.

 curl -XPOST 
 http://localhost:9200/logs-idx.20140613/event/_search?search_type=count; -
 d'
 {
   aggs: {
 f1: {
   filter: {
 or: [
   {
 and: [
   {
 has_parent: {
   type: request,
   filter: {
 and: {
   filters: [
 {
   term: {
 country: US
   }
 },
 {
   term: {
 city: NY
   }
 },
 {
   term: {
 code: 12
   }
 }
   ]
 }
   }
 }
   },
   {
 range: {
   event_time: {
 gte: 2014-06-13T10:00:00,
 lt: 2014-06-13T11:00:00
   }
 }
   }
 ]
   },
   {
 and: [
   {
 has_parent: {
   type: request,
   filter: {
 and: {
   filters: [
 {
   term: {
 country: US
   }
 },
 {
   term: {
 city: NY
   }
 },
 {
   term: {
 code: 12
   }
 },
 {
   range: {
 request_time: {
   gte: 2014-06-13T10:00:00,
   lt: 2014-06-13T11:00:00
 }
   }
 }
   ]
 }
   }
 }
   },
   {
 range: {
   event_time: {
 lt: 2014-06-13T10:00:00
   }
 }
   }
 ]
   }
 ]
   },
   aggs: {
 per_interval: {
   date_histogram: {
 field: event_time,
 interval: minute
   },
   aggs: {
 metrics: {
   terms: {
 field: event,
 size: 10
   }
 }
   }
 }
   }
 }
   }
 }'


 On Friday, 13 June 2014 10:09:46 UTC+3, Thomas wrote:

 Hi,

 I'm facing a performance issue with some aggregations I perform, and I
 need your help if possible:

 I have to documents, the *request* and the *event*. The request is the
 parent of the event. Below is a (sample) mapping

 event : {
 dynamic : strict,
 _parent : {
type : request
 },
 properties : {
event_time : {
 format : dateOptionalTime,
 type : date
},
count : {
   type : integer
 },
 event : {
 index : not_analyzed,
 type : string
 }
  }
 }

 request : {
 dynamic : strict,
  _id : {
path : uniqueId
  },
  properties : {
 uniqueId : {
  index : not_analyzed,
  type : string
 },
 user : {
  index : not_analyzed,
  type : string
 },
code : {
   type : integer
},
country : {
  index : not_analyzed,
  type : string
},
city : {
  index : not_analyzed,
  type : string
}
   
 }
 }

 My cluster is becoming really big (almost 2 TB of data with billions of
 documents) and i maintain one index per day, whereas I occasionally delete
 old indices. My daily index is about 20GB big. The version of elasticsearch
 that I use is 1.1.1.

 My problems start when I want to get some aggregations of events with
 some criteria which is applied in the parent request document. For example
 count be the events of type

Re: Java API ES 0.90.9 Array (2 elements) in search result gets only one value in SearchHitField.getValues()

2014-06-13 Thread Martin Pape
I've tested it with ES 1.1 and the described behaviour is gone. So the Java 
API does a correct interpretation of the JSON search result.

On Thursday, January 30, 2014 11:23:04 PM UTC+1, Martin Pape wrote:

 Thanks for the information. I still have some months till production, so 
 might workaround now and wait for ES 1.0. Anyone know then ES 1.0 is 
 planned to be release?

 BR - Martin

 On Thursday, January 30, 2014 6:54:58 PM UTC+1, Binh Ly wrote:

 Martin, I have verified this behavior and it still pesists in 0.90.10. I 
 checked the latest ES master build at it indeed returns 2 values in the 
 List as expected so I am expecting it to behave as you expect in ES 1.0. 
 For now, what it does is it returns a single item inside the List, but that 
 item is in turn an ArrayList of 2 String values. If you have only 1 value, 
 it returns a single item in the List, and that item is a String. So you can 
 test accordingly in code to check if the value is a String or an ArrayList 
 and then adjust accordingly. Should be rectified in 1.0 I hope. :)



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5bc24d1a-d696-4bc5-97bd-86f6780cd4f4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Cassandra with JDBC river plugin

2014-06-13 Thread Abhishek Mukherjee
Hi Everyone,

I am trying to move data from Cassandra to Elasticsearch. Initially I tried 
the cassandra-river at https://github.com/eBay/cassandra-river. However I 
got timed out error which I suspect was originating from the Hector API. I 
posted a question on ths thread 
https://groups.google.com/forum/#!searchin/elasticsearch/cassandra/elasticsearch/4oDbkqK3GVA/W9WLK4SS2MEJ.

Moving on I thought of using the JDBC-river at 
https://github.com/jprante/elasticsearch-river-jdbc with a java driver for 
cassandra.  I followed the mysql example and modified it for cassandra. I 
created the river using as follows:

curl -XPUT '192.168.1.103:9200/_river/my_jdbc_river/_meta' -d '{   
 type : jdbc,
 jdbc : {
 url : 
jdbc:cassandra://192.168.1.105:9160/transactionlogdb, 
 cql : select * from 
logs
 }
 }'


{_index:_river,_type:my_jdbc_river,_id:_meta,_version:1,created:true}

However I don't find any documents being created on the jdbc index. Am I 
missing something? Any help or tips is very much appreciated. Thanks is 
advance.

Kind Regards,
Abhishek Mukherjee

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3d4228b3-0568-4a51-b42b-8bc4ca6b7e79%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ES 1.2.1 sort by _timestamp

2014-06-13 Thread Stefan Eberl


On Thursday, June 12, 2014 6:52:16 PM UTC+2, Itamar Syn-Hershko wrote:

 This is weird. Are you sure what you are seeing is not overridden 
 documents (can happen if you specify the ID yourself)? Can you add the 
 _timestamp field to the results and verify the documents are indeed not 
 sorted by _timestamp?

The id is also automatically generated by ES.
Do i need to store the _timestamp field to be able to retrieve it  using   
fields : [_timestamp]  in my query?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c1190f3b-2707-4b1e-9151-5967a7a54733%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ES 1.2.1 sort by _timestamp

2014-06-13 Thread Itamar Syn-Hershko
Possibly, because it's not provided in the _source, or just use this:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-timestamp-field.html#_path_2

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer  Consultant
Author of RavenDB in Action http://manning.com/synhershko/


On Fri, Jun 13, 2014 at 11:26 AM, Stefan Eberl cpppw...@gmail.com wrote:



 On Thursday, June 12, 2014 6:52:16 PM UTC+2, Itamar Syn-Hershko wrote:

 This is weird. Are you sure what you are seeing is not overridden
 documents (can happen if you specify the ID yourself)? Can you add the
 _timestamp field to the results and verify the documents are indeed not
 sorted by _timestamp?

 The id is also automatically generated by ES.
 Do i need to store the _timestamp field to be able to retrieve it  using
 fields : [_timestamp]  in my query?

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/c1190f3b-2707-4b1e-9151-5967a7a54733%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/c1190f3b-2707-4b1e-9151-5967a7a54733%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZvRD%3DbHqX-snqSLV9p%3D%3DA3KEwhP%3DNHAiH6m%2BMz2AhRvAQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Need help, multiple aggregations with filters extremely slow, where to look for optimizations?

2014-06-13 Thread Thomas
So I restructured my curl as follows, is this what you mean?, by doing some 
first hits i do get some slight improvement, but need to check into 
production data:

Thank you will try it and come back with results

curl -XPOST 
http://10.129.2.42:9200/logs-idx.20140613/event/_search?search_type=count; 
-d'
{
  query: {
filtered: {
  filter: {
or: [
  {
and: [
  {
has_parent: {
  type: request,
  filter: {
and: {
  filters: [
{
  term: {
country: US
  }
},
{
  term: {
city: NY
  }
},
{
  term: {
code: 12
  }
}
  ]
}
  }
}
  },
  {
range: {
  event_time: {
gte: 2014-06-13T10:00:00,
lt: 2014-06-13T11:00:00
  }
}
  }
]
  },
  {
and: [
  {
has_parent: {
  type: request,
  filter: {
and: {
  filters: [
{
  term: {
country: US
  }
},
{
  term: {
city: NY
  }
},
{
  term: {
code: 12
  }
},
{
  range: {
request_time: {
  gte: 2014-06-13T10:00:00,
  lt: 2014-06-13T11:00:00
}
  }
}
  ]
}
  }
}
  },
  {
range: {
  event_time: {
lt: 2014-06-13T10:00:00
  }
}
  }
]
  }
]
  }
}
  },
  aggs: {
per_interval: {
  date_histogram: {
field: event_time,
interval: minute
  },
  aggs: {
metrics: {
  terms: {
field: event,
size: 12
  }
}
  }
}
  }
}'


On Friday, 13 June 2014 10:09:46 UTC+3, Thomas wrote:

 Hi,

 I'm facing a performance issue with some aggregations I perform, and I 
 need your help if possible:

 I have to documents, the *request* and the *event*. The request is the 
 parent of the event. Below is a (sample) mapping

 event : {
 dynamic : strict,
 _parent : {
type : request
 },
 properties : {
event_time : {
 format : dateOptionalTime,
 type : date
},
count : {
   type : integer
 },
 event : {
 index : not_analyzed,
 type : string
 }
  }
 }

 request : {
 dynamic : strict,
  _id : {
path : uniqueId
  },
  properties : {
 uniqueId : {
  index : not_analyzed,
  type : string
 },
 user : {
  index : not_analyzed,
  type : string
 },
code : {
   type : integer
},
country : {
  index : not_analyzed,
  type : string
},
city : {
  index : not_analyzed,
  type : string
}
   
 }
 }

 My cluster is becoming really big (almost 2 TB of data with billions of 
 documents) and i maintain one index per day, whereas I occasionally delete 
 old indices. My daily index is about 20GB big. The version of elasticsearch 
 that I use is 1.1.1. 

 My problems start when I want to get some aggregations of events with some 
 criteria which is applied in the parent request document. For example count 
 be the events of type *click for country = US and code=12. What I was 
 initially doing was to generate a scriptFilter for the request document (in 
 Groovy) and I was adding multiple aggregations in one search request. This 
 ended up being very slow so I removed the scripting logic and I supported 
 my logic with java code.*

 What seems to be initially solved in my local machine, when I got back to 
 the cluster, nothing has changed. Again my app performs really really poor. 
 I get more than 10 seconds to perform a search with ~10

Mapping for a hash map

2014-06-13 Thread Manuel Vacelet
Hi there,

I'd like to define a mapping for a hash map but I do not manage to get it 
right.
Here is the kind of documents I'd like to index:
{
message : Elasticsearch test 1,
dates: {
create: 2014-01-11,
update: 2014-06-12
}
}

{
message : Elasticsearch test 2,
dates: {
date_1: 2014-01-11,
}
}

Note: date_1 is on purpose, I cannot know at mapping definition how many 
dates I will have to deal with.

As is, without mapping it works automagically (probably thanks to type 
autodetection) but is there a mean to get it done without ?

My problem is that I might have stuff like that too:
{
message : Elasticsearch test 3,
strings: {
string_1: some text,
string_2: 2014-01-11
}
}

{
message : Elasticsearch test 4,
strings: {
string_2: some other text
}
}


In this case I need to be able to enforce that string_2 is not a date 

What is the right way to do it ?
Manuel

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f45de373-4b57-4550-b9ae-c68d71dcf459%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Cassandra with JDBC river plugin

2014-06-13 Thread Abhishek Mukherjee
Checking the Elasticsearch log files I found this.

No suitable driver found for 
jdbc:cassandra://192.168.1.103:9160/transactionlogdb
at java.sql.DriverManager.getConnection(DriverManager.java:689)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at 
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.getConnectionForReading(SimpleRiverSource.java:133)
at 
org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.execute(SimpleRiverSource.java:271)

However I have placed all the necessary jar files for the driver in 
$ES_HOME/plugins/jdbc. Please advice.

Kind Regards
Abhishek

On Friday, June 13, 2014 1:43:45 PM UTC+5:30, Abhishek Mukherjee wrote:

 Hi Everyone,

 I am trying to move data from Cassandra to Elasticsearch. Initially I 
 tried the cassandra-river at https://github.com/eBay/cassandra-river. 
 However I got timed out error which I suspect was originating from the 
 Hector API. I posted a question on ths thread 
 https://groups.google.com/forum/#!searchin/elasticsearch/cassandra/elasticsearch/4oDbkqK3GVA/W9WLK4SS2MEJ
 .

 Moving on I thought of using the JDBC-river at 
 https://github.com/jprante/elasticsearch-river-jdbc with a java driver 
 for cassandra.  I followed the mysql example and modified it for cassandra. 
 I created the river using as follows:

 curl -XPUT '192.168.1.103:9200/_river/my_jdbc_river/_meta' -d '{   
  type : jdbc,
  jdbc : {
  url : jdbc:cassandra://192.168.1.105:9160/transactionlogdb, 
  

  cql : select * from 
 logs
  }
  }'



 {_index:_river,_type:my_jdbc_river,_id:_meta,_version:1,created:true}

 However I don't find any documents being created on the jdbc index. Am I 
 missing something? Any help or tips is very much appreciated. Thanks is 
 advance.

 Kind Regards,
 Abhishek Mukherjee


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a5dc2bf7-2380-4425-893a-a44e587ba9ce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How ElasticSearch nodes syncrhonise in Cluster when nodes have different Index mappings

2014-06-13 Thread Bhupali Kalmegh
Hi ,

Kindly help me to understand the behaviour of ES nodes in Cluster when 
nodes have different Index mappings.

I have 2 ES nodes both are currently having same Index versions. Now I want 
to upgrade both the nodes with the new index mapping.

Scenario 1 : Without keeping the node down, start mapping changes on Node1. 
During the mapping changes, if any request comes  suppose it is handled by 
Node1 only.
Now, How  when this Node#1  Node#2 will synchronise?

Scenario 2 : Without keeping the node down, start mapping changes on Node1. 
During the mapping changes, if any request comes  suppose it is handled by 
Node2.
Now, How  when this Node#1  Node#2 will synchronise?

Want to know, whose data will be available after the complete mappings 
completion on both nodes.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/848a1c4e-a0a1-4a65-80d9-d184bda6b3ef%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Need help, multiple aggregations with filters extremely slow, where to look for optimizations?

2014-06-13 Thread Adrien Grand
Indeed that is what I meant.


On Fri, Jun 13, 2014 at 10:33 AM, Thomas thomas.bo...@gmail.com wrote:

 So I restructured my curl as follows, is this what you mean?, by doing
 some first hits i do get some slight improvement, but need to check into
 production data:

 Thank you will try it and come back with results

 curl -XPOST 
 http://10.129.2.42:9200/logs-idx.20140613/event/_search?search_type=count;
 -d'
 {
   query: {
 filtered: {

   filter: {
 or: [
   {
 and: [
   {
 has_parent: {
   type: request,
   filter: {
 and: {
   filters: [
 {
   term: {
 country: US
   }
 },
 {
   term: {
 city: NY
   }
 },
 {
   term: {
 code: 12
   }
 }
   ]
 }
   }
 }
   },
   {
 range: {
   event_time: {
 gte: 2014-06-13T10:00:00,
 lt: 2014-06-13T11:00:00
   }
 }
   }
 ]
   },
   {
 and: [
   {
 has_parent: {
   type: request,
   filter: {
 and: {
   filters: [
 {
   term: {
 country: US
   }
 },
 {
   term: {
 city: NY
   }
 },
 {
   term: {
 code: 12
   }
 },
 {
   range: {
 request_time: {
   gte: 2014-06-13T10:00:00,
   lt: 2014-06-13T11:00:00
 }
   }
 }
   ]
 }
   }
 }
   },
   {
 range: {
   event_time: {
 lt: 2014-06-13T10:00:00
   }
 }
   }
 ]
   }
 ]
   }
 }
   },
   aggs: {
 per_interval: {
   date_histogram: {
 field: event_time,
 interval: minute
   },
   aggs: {
 metrics: {
   terms: {
 field: event,
 size: 12
   }
 }
   }
 }
   }
 }'


 On Friday, 13 June 2014 10:09:46 UTC+3, Thomas wrote:

 Hi,

 I'm facing a performance issue with some aggregations I perform, and I
 need your help if possible:

 I have to documents, the *request* and the *event*. The request is the
 parent of the event. Below is a (sample) mapping

 event : {
 dynamic : strict,
 _parent : {
type : request
 },
 properties : {
event_time : {
 format : dateOptionalTime,
 type : date
},
count : {
   type : integer
 },
 event : {
 index : not_analyzed,
 type : string
 }
  }
 }

 request : {
 dynamic : strict,
  _id : {
path : uniqueId
  },
  properties : {
 uniqueId : {
  index : not_analyzed,
  type : string
 },
 user : {
  index : not_analyzed,
  type : string
 },
code : {
   type : integer
},
country : {
  index : not_analyzed,
  type : string
},
city : {
  index : not_analyzed,
  type : string
}
   
 }
 }

 My cluster is becoming really big (almost 2 TB of data with billions of
 documents) and i maintain one index per day, whereas I occasionally delete
 old indices. My daily index is about 20GB big. The version of elasticsearch
 that I use is 1.1.1.

 My problems start when I want to get some aggregations of events with
 some criteria which is applied in the parent request document. For example
 count be the events of type *click for country = US and code=12. What I
 was initially doing was to generate a scriptFilter for the request document
 (in Groovy) and I was adding multiple aggregations in one search request.
 This ended up being very slow so I removed the scripting logic and I
 supported my logic with java code

Re: does document database means denormalize

2014-06-13 Thread Jilles van Gurp
Yes, definitely think in terms of denormalizing. Joins are hard/expensive 
in elasticsearch so you need to avoid needing to joing by prejoining. But 
you have other options as well, see 
 http://www.elasticsearch.org/blog/managing-relations-inside-elasticsearch/

So, say you had a person table and a address table in a database, where you 
have a 1:1 relation, that's a no brainer: shove the address in the person 
index along with the rest of the person data.

If you had another table called company with a 1:n relation to person, it 
gets more tricky. Now you have options.

Option 1: put the company data in the person index. Sure you are copying 
data all over the place but storage is cheap and it is not like you are 
going to have a trillion companies or persons. Your main worry is not space 
but consistency. What happens if you need to change the company details?
Option 2: put the person objects in an array in the company objects. Fine 
as long as you don't need to query for the persons separately.
Option 3: store just the company id in the person index or the person id in 
the company index (array). Now you will end up in situations where you may 
need to join and you'll have to fire many queries and manipulate search 
results to do it, which is slow, tedious to program, and somewhat error 
prone. But for simple use cases you might get away with it.
Option 4: use nested documents to put persons in companies. Now you can use 
nested queries and aggregations, which give you join like benefits. Don't 
use this for massive amounts of nested documents on a single parent.
Option 5: use parent child documents to give persons a company parent. More 
flexibe than nested and gives you some performance benefits since parent 
and child reside on the same shard. So same as option 3 but faster.
Option 6: compromise: denormalize some but not all of the fields and keep 
things in a separate index as well.

With n:m style relations it gets a bit harder. Probably you don't want to 
index the cartesian product, so you'll need to compromise. Any of the 
options above could work. All depends on how many relations you are really 
managing.

We've actually gotten rid of our database entirely. Once you get used to 
it, thinking in terms of documents is much more natural than thinking in 
terms of rows, tables, and relations. You have much less of an impedance 
mismatch that you need to pretend does not exist with some object 
relational library. It's more like here's an object, serialize it, store 
it, query for it. 

Jilles

On Friday, June 13, 2014 9:48:37 AM UTC+2, eune...@gmail.com wrote:

 What I am asking is  

 Do different design decisions apply in elasticsearch compared  to 
 relational 

 Is denormalized better for elasticsearch



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/69337cde-4962-4c9f-a59a-3c01d26440a6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How ElasticSearch nodes syncrhonise in Cluster when nodes have different Index mappings

2014-06-13 Thread Luis García Acosta
Mapping is applied at cluster level, and existing index wont get the new 
mapping. You will need to reindex your data, aka create a new index after you 
apply the new mapping

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4a02b6e5-71e9-47bf-a334-379a666ed2bd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How ElasticSearch nodes syncrhonise in Cluster when nodes have different Index mappings

2014-06-13 Thread Bhupali Kalmegh
Yes, I am creating new index and then migrating the data from older index 
to new index. So, when this migration is going on, if any request comes, 
then what would be the behaviour?


On Friday, June 13, 2014 3:11:52 PM UTC+5:30, Luis García Acosta wrote:

 Mapping is applied at cluster level, and existing index wont get the new 
 mapping. You will need to reindex your data, aka create a new index after 
 you apply the new mapping

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c903f779-1b3d-42fd-8d49-4157465c9eac%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ES 1.2.1 sort by _timestamp

2014-06-13 Thread Stefan Eberl


On Friday, June 13, 2014 10:31:53 AM UTC+2, Itamar Syn-Hershko wrote:

 Possibly, because it's not provided in the _source, or just use this: 
 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-timestamp-field.html#_path_2

 So your suggestion is to have my app fill an additional field, which then 
gets's mapped to _timestamp, correct?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d68a20f4-8234-4bfc-b4ee-d135f948dda5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ES 1.2.1 sort by _timestamp

2014-06-13 Thread Itamar Syn-Hershko
This is just to debug this, to make sure results are indeed not sorted by
_timestamp, as you claim. Probably easier to just set _timestamp to stored.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer  Consultant
Author of RavenDB in Action http://manning.com/synhershko/


On Fri, Jun 13, 2014 at 12:49 PM, Stefan Eberl cpppw...@gmail.com wrote:



 On Friday, June 13, 2014 10:31:53 AM UTC+2, Itamar Syn-Hershko wrote:

 Possibly, because it's not provided in the _source, or just use this:
 http://www.elasticsearch.org/guide/en/elasticsearch/
 reference/current/mapping-timestamp-field.html#_path_2

 So your suggestion is to have my app fill an additional field, which then
 gets's mapped to _timestamp, correct?

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/d68a20f4-8234-4bfc-b4ee-d135f948dda5%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/d68a20f4-8234-4bfc-b4ee-d135f948dda5%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuuCpjAnQhGM3WnKuPd8VcgD3X_HqDcf7MA%2B%3D7dYRcjxg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


How we can use different analyzers for one field.(At a time only one analyzer we will use in search but, search requirement is differ)

2014-06-13 Thread Elastic Sowjanya
Hi,
I have below Requirement. Please help me.

I am using *elasticsearch-1.1.0*


   - In Index, I have n no.of Fields. and m no.of Types
   - Eg: Types: Person,Book
   - Eg: Fields:
   - Person: Name,age,Email,Phone
   - Book: Name,author,price
   - How to set the analyzers to all Fields and all types.

*My Search Requirement:*
Input:
Person:
Name: john smith
age:30
Email: j...@gmail.com
Phone: (987) 123-4567
--
Name: John Smith
age:30
Email: j...@gmail.com
Phone: (879) 123-4567

Name: django$haystack
age:30
Email: j...@gmail.com
Phone: (987) 123-4567
---
Name: django#haystack
age:30
Email: j...@gmail.com
Phone: (987) 123-4567

*Scenario1:* 
*Search String* : django#haystack
*Results*: django$haystack,django#haystack
But, We are expected Result is  *django#haystack*

*Scenario2:*
*Search String: *John Smith
*Results*: John Smith,john smith
Expected Results are John Smith
*Scenario3:*
*Search String: *John Smith
*Results*: John Smith,john smith
*It is fine. But, we need to support Scenario2 also. How we can support 
Scenario2 and Scenario3 using analyzers.*


*Please Help me in this.*

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ef287049-2c3a-4862-b077-82bf3cb63a9e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How ElasticSearch nodes syncrhonise in Cluster when nodes have different Index mappings

2014-06-13 Thread Mark Walkom
That depends on how you do the migration, it's not something ES handles
automatically, you need to do it yourself.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 13 June 2014 19:47, Bhupali Kalmegh bhupali...@gmail.com wrote:

 Yes, I am creating new index and then migrating the data from older index
 to new index. So, when this migration is going on, if any request comes,
 then what would be the behaviour?



 On Friday, June 13, 2014 3:11:52 PM UTC+5:30, Luis García Acosta wrote:

 Mapping is applied at cluster level, and existing index wont get the new
 mapping. You will need to reindex your data, aka create a new index after
 you apply the new mapping

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/c903f779-1b3d-42fd-8d49-4157465c9eac%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/c903f779-1b3d-42fd-8d49-4157465c9eac%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624btWzhjYs-RsFSU%3DTN1JUmEDdDfabpNU-0Yd2%2BSP3J%2BiQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Cassandra with JDBC river plugin

2014-06-13 Thread joergpra...@gmail.com
The Cassandra Java Driver is not a JDBC driver.

Jörg


On Fri, Jun 13, 2014 at 11:11 AM, Abhishek Mukherjee 4271...@gmail.com
wrote:

 Checking the Elasticsearch log files I found this.

 No suitable driver found for jdbc:cassandra://
 192.168.1.103:9160/transactionlogdb
 at java.sql.DriverManager.getConnection(DriverManager.java:689)
 at java.sql.DriverManager.getConnection(DriverManager.java:247)
 at
 org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.getConnectionForReading(SimpleRiverSource.java:133)
 at
 org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.execute(SimpleRiverSource.java:271)

 However I have placed all the necessary jar files for the driver in
 $ES_HOME/plugins/jdbc. Please advice.

 Kind Regards
 Abhishek


 On Friday, June 13, 2014 1:43:45 PM UTC+5:30, Abhishek Mukherjee wrote:

 Hi Everyone,

 I am trying to move data from Cassandra to Elasticsearch. Initially I
 tried the cassandra-river at https://github.com/eBay/cassandra-river.
 However I got timed out error which I suspect was originating from the
 Hector API. I posted a question on ths thread https://groups.google.com/
 forum/#!searchin/elasticsearch/cassandra/elasticsearch/4oDbkqK3GVA/
 W9WLK4SS2MEJ.

 Moving on I thought of using the JDBC-river at
 https://github.com/jprante/elasticsearch-river-jdbc with a java driver
 for cassandra.  I followed the mysql example and modified it for cassandra.
 I created the river using as follows:

 curl -XPUT '192.168.1.103:9200/_river/my_jdbc_river/_meta' -d '{
  type : jdbc,
  jdbc : {
  url : jdbc:cassandra://192.168.1.105:9160/transactionlogdb,

  cql : select * from logs

  }
  }'


 {_index:_river,_type:my_jdbc_river,_id:_meta,
 _version:1,created:true}

 However I don't find any documents being created on the jdbc index. Am I
 missing something? Any help or tips is very much appreciated. Thanks is
 advance.

 Kind Regards,
 Abhishek Mukherjee

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/a5dc2bf7-2380-4425-893a-a44e587ba9ce%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/a5dc2bf7-2380-4425-893a-a44e587ba9ce%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGf1GjB8MSefx6ZC1OYD0b6Xf%3DKX%2BpDHcYe7cvZVmeyJg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query multiple strings in a field in kibana3?

2014-06-13 Thread Mark Walkom
You can save dashboards with the query, if that is what you want. You will
need to save one per query though.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 13 June 2014 18:15, Siddharth Trikha siddharthtrik...@gmail.com wrote:

  I am using Logstash 1.4.1, elasticsearch 1.1.1, kibana 3.1 for analyzing
 my logs. I get the parsed fields (from log) in Kibana 3.

 Now, I have often query on a particular field for many strings. Eg:
 auth_message is a field and I may have to query for like 20 different
 strings (all together or separately).

 If together:

 auth_message: login failed OR user XYZ OR authentication failure OR 
 .

 If separate queries:

 auth_message: login failed
 auth_message: user XYZ
 auth_message: authentication failure

 So user cannot remember 20 strings for a field to be searched for. Is
 there a way to store or present it to user to select the strings he wants
 to search for.

 Can this be done using ELK ?? Please help

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/751ba805-557c-4531-9a4f-fe3d4d05a495%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/751ba805-557c-4531-9a4f-fe3d4d05a495%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624a56NNScBye20btBhLLxxCNMHT%2BHE6_Em48v_bag5G-sQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: RepositoryMissingException

2014-06-13 Thread Shawn Mullen
good question.  that is what is being returned when I make the call.  but
your question gave me an idea as to what the problem is.

thanks.
On Jun 12, 2014 11:32 PM, David Pilato da...@pilato.fr wrote:

 What is this -d in statlogs -d?

 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


 Le 13 juin 2014 à 03:58, Shawn Mullen shawnmull...@gmail.com a écrit :

 I have an ElasticSearch instance running on my local machine.  I installed
 the S3 plugin so I can do backup and restore operation to/from S3.

 I tried to follow the documentation on how to set this up.  I was able to
 register a snapshot repository and I have a bucket in S3 created just for
 backups.  When I do a /_all I see the current repo settings. So, at this
 point all looks fine.  However, when I try to create a snapshot it fails
 with RepositoryMissingException.

 This is what I get for a /_all:

 {
statlogs -d: {
   type: s3,
   settings: {
  region: us-east,
  bucket: my-bucket-name,
  access_key: my-access-key,
  secret_key: my-secret-key
   }
}
 }

 This is what I am sending when I try to do a snapshot:

 PUT /_snapshot/statlogs/snapshot_1 -d
 {
 indices: [statexceptionlog],
 ignore_unavailable: true,
 include_global_state: false
 }

 I am using Sense to send the commands.

 I'm assuming I am getting the error because of something wrong with my S3
 settings but I don't know what it would be.  I'm making this assumption
 because the /_all returns data (but I guess that could be wrong).  Any
 ideas on what the issue might be?  What exactly causes
 RepositoryMissingException?

 Thanks.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/ac375bf8-6f78-41a2-9cb0-0409234a004f%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/ac375bf8-6f78-41a2-9cb0-0409234a004f%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.

 --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/-av1rGK1bQE/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/36BC1F5F-1D3A-4E56-A519-C1952C9C0D80%40pilato.fr
 https://groups.google.com/d/msgid/elasticsearch/36BC1F5F-1D3A-4E56-A519-C1952C9C0D80%40pilato.fr?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAFwLvuLJwOaqYicR%3DPRHzO9vWdnVbHb1WjcYHuRXSgwXRAUx%2BQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Email alert after threshold crossed logstash?

2014-06-13 Thread Siddharth Trikha
 

I am using logstash, elasticsearch and kibana to analyse my logs. I am 
alerting via email when a particular string comes into the log via email 
output in logstash: 

email {
match = [ Session Detected, logline,*Session closed* ]
...
}

This works fine.

Now, I want to alert on the count of a field (when a threshold is crossed): 
Eg If user is field, I want to alert when number of unique users go more 
than 5. 

Can this be done via email output in logstash??
Please help.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cc8f4f96-6593-424d-9599-759092b5c409%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Query multiple strings in a field in kibana3?

2014-06-13 Thread Siddharth Trikha
So no way to store the query itself? I will have save the entire dashboard?


On Fri, Jun 13, 2014 at 4:35 PM, Mark Walkom ma...@campaignmonitor.com
wrote:

 You can save dashboards with the query, if that is what you want. You will
 need to save one per query though.

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com


 On 13 June 2014 18:15, Siddharth Trikha siddharthtrik...@gmail.com
 wrote:

  I am using Logstash 1.4.1, elasticsearch 1.1.1, kibana 3.1 for
 analyzing my logs. I get the parsed fields (from log) in Kibana 3.

 Now, I have often query on a particular field for many strings. Eg:
 auth_message is a field and I may have to query for like 20 different
 strings (all together or separately).

 If together:

 auth_message: login failed OR user XYZ OR authentication failure OR 
 .

 If separate queries:

 auth_message: login failed
 auth_message: user XYZ
 auth_message: authentication failure

 So user cannot remember 20 strings for a field to be searched for. Is
 there a way to store or present it to user to select the strings he wants
 to search for.

 Can this be done using ELK ?? Please help

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.

 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/751ba805-557c-4531-9a4f-fe3d4d05a495%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/751ba805-557c-4531-9a4f-fe3d4d05a495%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/oVamXmsrmVc/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAEM624a56NNScBye20btBhLLxxCNMHT%2BHE6_Em48v_bag5G-sQ%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAEM624a56NNScBye20btBhLLxxCNMHT%2BHE6_Em48v_bag5G-sQ%40mail.gmail.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




-- 
Regards
Siddharth Trikha

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAH%3D5yJz49dJp94ubCL-Ewa%2BK4fg%3D%3DWBJEvixWZKNaiTinkdyaA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Runtime JRE?

2014-06-13 Thread joergpra...@gmail.com
Yes, you can use Java Server JRE. It is a build without Java desktop
graphics library (aka headless JVM).

Jörg


On Fri, Jun 13, 2014 at 1:53 PM, thatguy1...@gmail.com wrote:

 I know the guide says the following:

 While a JRE can be used for the Elasticsearch service, due to its use of a
 client VM (as oppose to a server JVM which offers better performance for
 long-running applications) its usage is discouraged and a warning will be
 issued.

 But I noticed something on oracles page called the Server JRE. Does anyone
 know if this is equivalent to the server JVM at runtime?

 Steve

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/d8cc08b5-a536-4c4a-a639-969bae1ae34e%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/d8cc08b5-a536-4c4a-a639-969bae1ae34e%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHS%2Bqtn-cU1VcLK4CmKoOg_CMYtY2XWBbOxHWXiL0QDNg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Configuring YML files Location

2014-06-13 Thread karthik jayanthi
Hi,

I am trying to setup the configuration of ES (elasticsearch.yml and
logging.yml) outside of the elasticsearch package. I have put the two files
in a separate location and pointed the CONF_DIR to that location. I launch
the ES server by specifying the cluster name and node name.

The problem I am seeing is that this configuration is not getting
picked-up. I verified this by checking the logs files. The log files get
updated when I have the yml config files in the ES directory. But when I
move them out, the logs don't get updated.


Any pointers on how to get configure the yml files location outside of the
ES package ?


Thanks,
Karthik Jayanthi

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAD7PaHz1PD26CMf1sN2AsudPoVDjNBTzbU7HwmtvJ39FuTWjdA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: compresstion in ES 1.2.1

2014-06-13 Thread sri
Hello Jorg,

I am sorry, there was some problem in the implementation at my end. Thanks 
a lot guys for the insight and help.
Appreciate the quick responses.

Thanks and Regards
Sri

On Sunday, June 8, 2014 5:04:24 PM UTC-4, sri wrote:

 Hello Jorg,

 Thanks a lot for the info., i tried applying the template provided by you 
 but the size is not reducing.On the other hand, I was noticing decrease in 
 size when i was disabling the fields via Mapping API.

 Thanks and Regards
 Sri

 On Sunday, June 8, 2014 4:37:58 PM UTC-4, Jörg Prante wrote:

 Try this index template for new index creations

 curl -XPUT 'localhost:9200/_template/template1' -d '
 {
 template : *,
 mappings : {
 _default_ : {
 _source : { enabled : false },
 _all : { enabled : false}
 }
 }
 }
 '

 See also 


 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-templates.html

 You can not disable _all or _source in an existing index.

 Jörg



 On Sun, Jun 8, 2014 at 10:22 PM, sri 1.fr...@gmail.com wrote:

 Thanks a lot for the insight Patrick. 

 I have a few more queries:

- it is possible to disable the '_source' and '_all' fields by 
default for all the indices that would be created later (possibility 
 define 
in the elasticsearch.yml file) 
- what happens if my index is created and then i disable '_source' 
and '_all' fields, would that effect the file size of the index, i.e., 
 will 
the fields be removed/disabled for only the documents that will be added 
after the disabling the fields?? 

 Thanks and Regards
 Sri

 On Sunday, June 8, 2014 2:48:16 PM UTC-4, Patrick Proniewski wrote:

 Hello, 

 I don't know how it's compressed but it appears that data is compressed 
 up to an amount of 4k. ie. it's useless to store data on a compressed 
 (lz4) 
 filesystem if fs block size is 4k: 

 Filesystem SizeUsed   Avail Capacity  Mounted on 
 zdata/ES-lz4   1.1T1.9G1.1T 0%/zdata/ES-lz4 
 zdata/ES   1.1T1.9G1.1T 0%/zdata/ES 

 But if fs block size is greater (say 128k), filesystem compression is a 
 huge win: 

 Filesystem SizeUsed   Avail Capacity  Mounted on 
 zdata/ES-lz4   1.1T1.1G1.1T 0%   
  /zdata/ES-lz4- compressratio  1.73x 
 zdata/ES-gzip  1.1T901M1.1T 0%   
  /zdata/ES-gzip- compressratio  2.27x 
 zdata/ES   1.1T1.9G1.1T 0%/zdata/ES 

 Unfortunately, a filesystem block size greater than 4K is not optimal 
 for IO (unless you have a big amount of physical memory you can dedicate 
 to 
 filesystem data cache, which would be redundant with ES cache). 



 On 08 juin 2014, at 18:41, David Pilato wrote: 

  It's compressed by default now. 
  
  -- 
  David ;-) 
  Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs 
  
  
  Le 8 juin 2014 à 18:01, sri 1.fr...@gmail.com a écrit : 
  
  Hello everyone, 
  
  I have read posts and blogs on how elasticsearch compression can be 
 enabled in the previous versions(0.17 - 0.19). 
  
  I am currently using ES 1.2.1, i wasn't able to find out how to 
 enable compression in this version or if at all there is any such option 
 for it. 
  
  I know that i can reduce the storage amount by disabling the source 
 using the mapping api, but what i was interested is the compression of 
 data 
 storage. 
  
  Thanks and Regards 
  Sri 

  -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/ea1e6264-9694-47b0-98d1-992c67bbb63d%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/ea1e6264-9694-47b0-98d1-992c67bbb63d%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/acc298a6-bae1-4bb1-ab1c-24ae28a54ff1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


No Node Available

2014-06-13 Thread Marcelo Paes Rech
Hi guys,

I googled about NoNodeAvailableException, but none answers for my questions 
until now. I'm getting this error when the ES connections between Server 
and Client are idle during a long time. I saw the number of connections in 
9300 port and there is a huge opened sockets number (something like 700 
connections). But if I count every client connection the number should be 
32.

Every morning I get this Exception once, and then everything works fine, 
without Exceptions anymore.

My client configurations follow:
client.transport.sniff: false
client.transport.ping_timeout: 30s
client.transport.nodes_sampler_interval: 5s

Regards.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a1779cc3-dfcc-48bf-9a44-bf55eacc2751%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Marvel 1.2.0 java.lang.IllegalStateException

2014-06-13 Thread John Smith
Is the released? Or it's still in github?

Experiencing the same thing...

Ran the commands from above...

http://pastebin.com/WUTTLgsS


On Monday, 9 June 2014 14:44:17 UTC-4, Paweł Krzaczkowski wrote:

 It works .. thx for a quick fix


 2014-06-09 17:48 GMT+02:00 Paweł Krzaczkowski pa...@krzaczkowski.pl 
 javascript::

 Im out of office for today so ill test it tomorrow morning and let You 
 know if it works

 pawel (at) mobile

 On 9 cze 2014, at 17:40, Boaz Leskes b.le...@gmail.com javascript: 
 wrote:

 Hi Pawel,

 We just did a quick minor release to marvel with a fix for this. Would be 
 great if you can give it a try and confirm how it goes.

 Cheers,
 Boaz

 On Friday, June 6, 2014 12:01:52 PM UTC+2, Boaz Leskes wrote:

 Thx Pawel,

 Note huge but larger then limit. Working on a fix.


 On Friday, June 6, 2014 10:10:45 AM UTC+2, Paweł Krzaczkowski wrote:

 This one is without metadata

 http://pastebin.com/tmJGA5Kq
 http://xxx:9200/_cluster/state/version,master_node,
 nodes,routing_table,blocks/?humanpretty

 Pawel

 W dniu piątek, 6 czerwca 2014 09:28:30 UTC+2 użytkownik Boaz Leskes 
 napisał:

 HI Pawel,

 I see - your cluster state (nodes + routing only, not meta data), 
 seems to be larger then 16KB when rendered to SMILE, which is quite big - 
 does this make sense?

 Above 16KB an underlying paging system introduced in the ES 1.x branch 
 kicks in. At that breaks something in Marvel than normally ships very 
 small 
 documents.

 I'll work on a fix. Can you confirm your cluster state (again, without 
 the metadata) is indeed very large?

 Cheers,
 Boaz

 On Thursday, June 5, 2014 10:56:00 AM UTC+2, Paweł Krzaczkowski wrote:

 Hi.

 After upgrading Marvel to 1.2.0 (running on Elasticsearch 1.2.1) i'm 
 getting errors like

 [2014-06-05 10:47:25,346][INFO ][node ] [es-m-3] 
 version[1.2.1], pid[68924], build[6c95b75/2014-06-03T15:02:52Z]
 [2014-06-05 10:47:25,347][INFO ][node ] [es-m-3] 
 initializing ...
 [2014-06-05 10:47:25,367][INFO ][plugins  ] [es-m-3] 
 loaded [marvel, analysis-icu], sites [marvel, head, segmentspy, browser, 
 paramedic]
 [2014-06-05 10:47:28,455][INFO ][node ] [es-m-3] 
 initialized
 [2014-06-05 10:47:28,456][INFO ][node ] [es-m-3] 
 starting ...
 [2014-06-05 10:47:28,597][INFO ][transport] [es-m-3] 
 bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
 192.168.0.212:9300]}
 [2014-06-05 10:47:42,340][INFO ][cluster.service  ] [es-m-3] 
 new_master [es-m-3][0H3grrJxTJunU1U6FmkIEg][es-m-3][inet[
 192.168.0.212/192.168.0.212:9300]]{data=false 
 http://192.168.0.212/192.168.0.212:9300%5D%5D%7Bdata=false, 
 master=true}, reason: zen-disco-join (elected_as_master)
 [2014-06-05 10:47:42,350][INFO ][discovery] [es-m-3] 
 freshmind/0H3grrJxTJunU1U6FmkIEg
 [2014-06-05 10:47:42,365][INFO ][http ] [es-m-3] 
 bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
 192.168.0.212:9200]}
 [2014-06-05 10:47:42,368][INFO ][node ] [es-m-3] 
 started
 [2014-06-05 10:47:44,098][INFO ][cluster.service  ] [es-m-3] 
 added 
 {[es-m-1][MHl5Ls-cRXCwc7OC-P0J5w][es-m-1][inet[/192.168.0.210:9300]]{data=false,
  
 machine=44454c4c-5300-1052-8038-b9c04f5a5a31, master=true},}, 
 reason: zen-disco-receive(join from node[[es-m-1][MHl5Ls-
 cRXCwc7OC-P0J5w][es-m-1][inet[/192.168.0.210:9300]]{data=false, 
 machine=44454c4c-5300-1052-8038-b9c04f5a5a31, master=true}])
 [2014-06-05 10:47:44,401][INFO ][gateway  ] [es-m-3] 
 recovered [28] indices into cluster_state
 [2014-06-05 10:47:48,683][ERROR][marvel.agent ] [es-m-3] 
 exporter [es_exporter] has thrown an exception:
 java.lang.IllegalStateException: array not available
 at org.elasticsearch.common.bytes.PagedBytesReference.
 array(PagedBytesReference.java:289)
 at org.elasticsearch.marvel.agent.exporter.ESExporter.
 addXContentRendererToConnection(ESExporter.java:209)
 at org.elasticsearch.marvel.agent.exporter.ESExporter.
 exportXContent(ESExporter.java:252)
 at org.elasticsearch.marvel.agent.exporter.ESExporter.
 exportEvents(ESExporter.java:161)
 at org.elasticsearch.marvel.agent.AgentService$
 ExportingWorker.exportEvents(AgentService.java:305)
 at org.elasticsearch.marvel.agent.AgentService$
 ExportingWorker.run(AgentService.java:240)
 at java.lang.Thread.run(Thread.java:745)
 [2014-06-05 10:47:58,738][ERROR][marvel.agent ] [es-m-3] 
 exporter [es_exporter] has thrown an exception:
 java.lang.IllegalStateException: array not available
 at org.elasticsearch.common.bytes.PagedBytesReference.
 array(PagedBytesReference.java:289)
 at org.elasticsearch.marvel.agent.exporter.ESExporter.
 addXContentRendererToConnection(ESExporter.java:209)
 at org.elasticsearch.marvel.agent.exporter.ESExporter.
 

Re: Marvel 1.2.0 java.lang.IllegalStateException

2014-06-13 Thread Paweł Krzaczkowski
Hi,

Yes it's been released as Marvel 1.2.1


2014-06-13 16:01 GMT+02:00 John Smith java.dev@gmail.com:

 Is the released? Or it's still in github?

 Experiencing the same thing...

 Ran the commands from above...

 http://pastebin.com/WUTTLgsS


 On Monday, 9 June 2014 14:44:17 UTC-4, Paweł Krzaczkowski wrote:

 It works .. thx for a quick fix


 2014-06-09 17:48 GMT+02:00 Paweł Krzaczkowski pa...@krzaczkowski.pl:

 Im out of office for today so ill test it tomorrow morning and let You
 know if it works

 pawel (at) mobile

 On 9 cze 2014, at 17:40, Boaz Leskes b.le...@gmail.com wrote:

 Hi Pawel,

 We just did a quick minor release to marvel with a fix for this. Would
 be great if you can give it a try and confirm how it goes.

 Cheers,
 Boaz

 On Friday, June 6, 2014 12:01:52 PM UTC+2, Boaz Leskes wrote:

 Thx Pawel,

 Note huge but larger then limit. Working on a fix.


 On Friday, June 6, 2014 10:10:45 AM UTC+2, Paweł Krzaczkowski wrote:

 This one is without metadata

 http://pastebin.com/tmJGA5Kq
 http://xxx:9200/_cluster/state/version,master_node,nodes,
 routing_table,blocks/?humanpretty

 Pawel

 W dniu piątek, 6 czerwca 2014 09:28:30 UTC+2 użytkownik Boaz Leskes
 napisał:

 HI Pawel,

 I see - your cluster state (nodes + routing only, not meta data),
 seems to be larger then 16KB when rendered to SMILE, which is quite big -
 does this make sense?

 Above 16KB an underlying paging system introduced in the ES 1.x
 branch kicks in. At that breaks something in Marvel than normally ships
 very small documents.

 I'll work on a fix. Can you confirm your cluster state (again,
 without the metadata) is indeed very large?

 Cheers,
 Boaz

 On Thursday, June 5, 2014 10:56:00 AM UTC+2, Paweł Krzaczkowski wrote:

 Hi.

 After upgrading Marvel to 1.2.0 (running on Elasticsearch 1.2.1) i'm
 getting errors like

 [2014-06-05 10:47:25,346][INFO ][node ] [es-m-3]
 version[1.2.1], pid[68924], build[6c95b75/2014-06-03T15:02:52Z]
 [2014-06-05 10:47:25,347][INFO ][node ] [es-m-3]
 initializing ...
 [2014-06-05 10:47:25,367][INFO ][plugins  ] [es-m-3]
 loaded [marvel, analysis-icu], sites [marvel, head, segmentspy, browser,
 paramedic]
 [2014-06-05 10:47:28,455][INFO ][node ] [es-m-3]
 initialized
 [2014-06-05 10:47:28,456][INFO ][node ] [es-m-3]
 starting ...
 [2014-06-05 10:47:28,597][INFO ][transport] [es-m-3]
 bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
 192.168.0.212:9300]}
 [2014-06-05 10:47:42,340][INFO ][cluster.service  ] [es-m-3]
 new_master [es-m-3][0H3grrJxTJunU1U6FmkIEg][es-m-3][inet[
 192.168.0.212/192.168.0.212:9300]]{data=false
 http://192.168.0.212/192.168.0.212:9300%5D%5D%7Bdata=false,
 master=true}, reason: zen-disco-join (elected_as_master)
 [2014-06-05 10:47:42,350][INFO ][discovery] [es-m-3]
 freshmind/0H3grrJxTJunU1U6FmkIEg
 [2014-06-05 10:47:42,365][INFO ][http ] [es-m-3]
 bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
 192.168.0.212:9200]}
 [2014-06-05 10:47:42,368][INFO ][node ] [es-m-3]
 started
 [2014-06-05 10:47:44,098][INFO ][cluster.service  ] [es-m-3]
 added 
 {[es-m-1][MHl5Ls-cRXCwc7OC-P0J5w][es-m-1][inet[/192.168.0.210:9300]]{data=false,
 machine=44454c4c-5300-1052-8038-b9c04f5a5a31, master=true},},
 reason: zen-disco-receive(join from node[[es-m-1][MHl5Ls-cRXCwc7OC
 -P0J5w][es-m-1][inet[/192.168.0.210:9300]]{data=false,
 machine=44454c4c-5300-1052-8038-b9c04f5a5a31, master=true}])
 [2014-06-05 10:47:44,401][INFO ][gateway  ] [es-m-3]
 recovered [28] indices into cluster_state
 [2014-06-05 10:47:48,683][ERROR][marvel.agent ]
 [es-m-3] exporter [es_exporter] has thrown an exception:
 java.lang.IllegalStateException: array not available
 at org.elasticsearch.common.bytes.PagedBytesReference.array(
 PagedBytesReference.java:289)
 at org.elasticsearch.marvel.agent.exporter.ESExporter.
 addXContentRendererToConnection(ESExporter.java:209)
 at org.elasticsearch.marvel.agent.exporter.ESExporter.
 exportXContent(ESExporter.java:252)
 at org.elasticsearch.marvel.agent.exporter.ESExporter.
 exportEvents(ESExporter.java:161)
 at org.elasticsearch.marvel.agent.AgentService$
 ExportingWorker.exportEvents(AgentService.java:305)
 at org.elasticsearch.marvel.agent.AgentService$
 ExportingWorker.run(AgentService.java:240)
 at java.lang.Thread.run(Thread.java:745)
 [2014-06-05 10:47:58,738][ERROR][marvel.agent ]
 [es-m-3] exporter [es_exporter] has thrown an exception:
 java.lang.IllegalStateException: array not available
 at org.elasticsearch.common.bytes.PagedBytesReference.array(
 PagedBytesReference.java:289)
 at org.elasticsearch.marvel.agent.exporter.ESExporter.
 addXContentRendererToConnection(ESExporter.java:209)
 at 

Re: Marvel 1.2.0 java.lang.IllegalStateException

2014-06-13 Thread John Smith
Ok works thanks

On Friday, 13 June 2014 10:02:06 UTC-4, Paweł Krzaczkowski wrote:

 Hi,

 Yes it's been released as Marvel 1.2.1


 2014-06-13 16:01 GMT+02:00 John Smith java.d...@gmail.com javascript::

 Is the released? Or it's still in github?

 Experiencing the same thing...

 Ran the commands from above...

 http://pastebin.com/WUTTLgsS


 On Monday, 9 June 2014 14:44:17 UTC-4, Paweł Krzaczkowski wrote:

 It works .. thx for a quick fix


 2014-06-09 17:48 GMT+02:00 Paweł Krzaczkowski pa...@krzaczkowski.pl:

 Im out of office for today so ill test it tomorrow morning and let You 
 know if it works

 pawel (at) mobile

 On 9 cze 2014, at 17:40, Boaz Leskes b.le...@gmail.com wrote:

 Hi Pawel,

 We just did a quick minor release to marvel with a fix for this. Would 
 be great if you can give it a try and confirm how it goes.

 Cheers,
 Boaz

 On Friday, June 6, 2014 12:01:52 PM UTC+2, Boaz Leskes wrote:

 Thx Pawel,

 Note huge but larger then limit. Working on a fix.


 On Friday, June 6, 2014 10:10:45 AM UTC+2, Paweł Krzaczkowski wrote:

 This one is without metadata

 http://pastebin.com/tmJGA5Kq
 http://xxx:9200/_cluster/state/version,master_node,nodes,
 routing_table,blocks/?humanpretty

 Pawel

 W dniu piątek, 6 czerwca 2014 09:28:30 UTC+2 użytkownik Boaz Leskes 
 napisał:

  HI Pawel,

 I see - your cluster state (nodes + routing only, not meta data), 
 seems to be larger then 16KB when rendered to SMILE, which is quite big 
 - 
 does this make sense?

 Above 16KB an underlying paging system introduced in the ES 1.x 
 branch kicks in. At that breaks something in Marvel than normally ships 
 very small documents.

 I'll work on a fix. Can you confirm your cluster state (again, 
 without the metadata) is indeed very large?

 Cheers,
 Boaz

 On Thursday, June 5, 2014 10:56:00 AM UTC+2, Paweł Krzaczkowski 
 wrote:

 Hi.

 After upgrading Marvel to 1.2.0 (running on Elasticsearch 1.2.1) 
 i'm getting errors like

 [2014-06-05 10:47:25,346][INFO ][node ] 
 [es-m-3] version[1.2.1], pid[68924], build[6c95b75/2014-06-03T15:02
 :52Z]
 [2014-06-05 10:47:25,347][INFO ][node ] 
 [es-m-3] initializing ...
 [2014-06-05 10:47:25,367][INFO ][plugins  ] 
 [es-m-3] loaded [marvel, analysis-icu], sites [marvel, head, 
 segmentspy, 
 browser, paramedic]
 [2014-06-05 10:47:28,455][INFO ][node ] 
 [es-m-3] initialized
 [2014-06-05 10:47:28,456][INFO ][node ] 
 [es-m-3] starting ...
 [2014-06-05 10:47:28,597][INFO ][transport] 
 [es-m-3] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address 
 {inet[/192.168.0.212:9300]}
 [2014-06-05 10:47:42,340][INFO ][cluster.service  ] 
 [es-m-3] new_master [es-m-3][0H3grrJxTJunU1U6FmkIEg][es-m-3][inet[
 192.168.0.212/192.168.0.212:9300]]{data=false 
 http://192.168.0.212/192.168.0.212:9300%5D%5D%7Bdata=false, 
 master=true}, reason: zen-disco-join (elected_as_master)
 [2014-06-05 10:47:42,350][INFO ][discovery] 
 [es-m-3] freshmind/0H3grrJxTJunU1U6FmkIEg
 [2014-06-05 10:47:42,365][INFO ][http ] 
 [es-m-3] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address 
 {inet[/192.168.0.212:9200]}
 [2014-06-05 10:47:42,368][INFO ][node ] 
 [es-m-3] started
 [2014-06-05 10:47:44,098][INFO ][cluster.service  ] 
 [es-m-3] added {[es-m-1][MHl5Ls-cRXCwc7OC-P0J
 5w][es-m-1][inet[/192.168.0.210:9300]]{data=false, 
 machine=44454c4c-5300-1052-8038-b9c04f5a5a31, master=true},}, 
 reason: zen-disco-receive(join from node[[es-m-1][MHl5Ls-cRXCwc7OC
 -P0J5w][es-m-1][inet[/192.168.0.210:9300]]{data=false, 
 machine=44454c4c-5300-1052-8038-b9c04f5a5a31, master=true}])
 [2014-06-05 10:47:44,401][INFO ][gateway  ] 
 [es-m-3] recovered [28] indices into cluster_state
 [2014-06-05 10:47:48,683][ERROR][marvel.agent ] 
 [es-m-3] exporter [es_exporter] has thrown an exception:
 java.lang.IllegalStateException: array not available
 at org.elasticsearch.common.bytes.PagedBytesReference.
 array(PagedBytesReference.java:289)
 at org.elasticsearch.marvel.agent.exporter.ESExporter.
 addXContentRendererToConnection(ESExporter.java:209)
 at org.elasticsearch.marvel.agent.exporter.ESExporter.
 exportXContent(ESExporter.java:252)
 at org.elasticsearch.marvel.agent.exporter.ESExporter.
 exportEvents(ESExporter.java:161)
 at org.elasticsearch.marvel.agent.AgentService$
 ExportingWorker.exportEvents(AgentService.java:305)
 at org.elasticsearch.marvel.agent.AgentService$
 ExportingWorker.run(AgentService.java:240)
 at java.lang.Thread.run(Thread.java:745)
 [2014-06-05 10:47:58,738][ERROR][marvel.agent ] 
 [es-m-3] exporter [es_exporter] has thrown an exception:
 java.lang.IllegalStateException: array not available
 at org.elasticsearch.common.bytes.PagedBytesReference.
 array(PagedBytesReference.java:289)
 at 

Re: Cannot Increase Write TPS in Elasticsearch by adding more nodes

2014-06-13 Thread Greg Murnane
I haven't seen it asked yet; what is feeding data into your elasticsearch? 
Depending on what you're doing to get it there, a large document size could 
easily bottleneck some feeding mechanisms. It's also noteable that some 
green spinning disks top out in the realm of 72MB/s. It might be useful 
to make sure that your feeding mechanism can handle more than 500 TPS.

-- 
The information transmitted in this email is intended only for the 
person(s) or entity to which it is addressed and may contain confidential 
and/or privileged material. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited. If you 
received this email in error, please contact the sender and permanently 
delete the email from any computer.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/372f7ff6-9245-4bb6-ae87-0eacedbb724e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Changing Kibana-int based on context

2014-06-13 Thread mysterydark


I am a newbie to Computer science in general and at present I am working on 
a project which involves Elasticsearch, logstash, and Kibana and we are 
using this to build up a centralized Logging system. In kibana config.js , 
there is a parameter kibana_index whose default value is set to 
Kibana-int. Is there a way possible to change the value of Kibana-index 
based on the context? What I could understand from my research is that 
kibana-int is the index which stores all the dashboards. When I say 
context, what I mean is if I have multiple projects in an organization, the 
dropdown on kibana dashboard page should show the dashboards only under a 
particular project when I give that project's name as the context in my 
url. So people working in a project get to see only the ones in their 
project. The only way I could find is to change the kibana-index value 
based on the project say something like kibana-projA. So it shows all the 
dashboards under this particular index. But I couldnt find a way as to how 
to do it. Could you please help me out.

Any help would be appreciated.

Thanks.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f8ff596f-7813-48b6-9fab-65d075aaebcd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Percollation limits

2014-06-13 Thread Maciej Dziardziel
Hi

I wanted to ask those who use percollation: how many queries are you 
percollating?

I need to set up some equivalent of percollation for about 100k queries. 
With some filtering
probably up to 10k would actually had to be checked for each new document.
Is the idea of using ES percollations for that insane?

Thanks
Maciej Dziardziel

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bf587216-9630-4eed-b30f-7f6a869778ab%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Kibana 3 and changing the default field from _all to message

2014-06-13 Thread Brian
I have this typical document being indexed by logstash. The following shows 
the document in rubydebug mode and not as JSON, but when converted to JSON 
and indexed the field names and values are the same (in other words, the 
syntax below isn't one-line JSON but it's clearer to read):

{
   message = 2014-06-13 16:15:18,431 foo=1 bar=3 text=\quoted 
strings work\ assist=true,
  @version = 1,
@timestamp = 2014-06-13T16:15:18.431Z,
  host = blacksheep,
   foo = 1,
   bar = 3,
  text = quoted strings work,
 assist = true
}

In preparation for the best possible performance, I disabled the _all field 
from all my logstash-* indices. It isn't needed, as the message field 
contains all of the original message's text anyway. And the _all field 
wastes time during indexing and space on disk.

But all of the answers to the question How can I configure Kibana to use 
the message field as the default and not the _all field seem to apply to 
Kibana 1 and 2, the ruby versions. There is no RubyConfig.rb file in Kibana 
3. And I cannot find any reference to the _all field, only to all indices 
(which I broke nicely when fumbling around; it applied only to indices as I 
quickly discovered).

Telling people to query for message:work instead of just work does not 
endear me to them.

Is there some way to configure Kibana 3 to change the default field in its 
Lucene query to message instead of _all?

Thank you in advance!

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b7b6d6d1-5690-496e-bdb1-1ee33b027b12%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Non-Uniform Drive Space Across Nodes

2014-06-13 Thread ES USER
OP here.  My numbers on the disk space were not an actual observation of 
current sizes.  It was more of a hypothetical of what can I expect ES to do 
if I only had three servers and that was the starting disk space available 
in each. 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ca665ae3-393e-4f79-8a31-60130751938f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Configuring YML files Location

2014-06-13 Thread Brian
For example, I keep my Elasticsearch configurations for use with the ELK 
stack within this directory:

*/opt/config/elk/current*

So my start-up script calls the elasticsearch command as follows:

$ES_HOME/elasticsearch -d ... -Des.path.conf=*/opt/config/elk/current* ...

Hope this helps!

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bc66b0cc-7505-4d43-87c8-d6ce1d87851b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Cassandra with JDBC river plugin

2014-06-13 Thread Abhishek Mukherjee
Ok. Thanks it seems I have to make the Cassandra river work.
On 13 Jun 2014 16:34, joergpra...@gmail.com joergpra...@gmail.com wrote:

 The Cassandra Java Driver is not a JDBC driver.

 Jörg


 On Fri, Jun 13, 2014 at 11:11 AM, Abhishek Mukherjee 4271...@gmail.com
 wrote:

 Checking the Elasticsearch log files I found this.

 No suitable driver found for jdbc:cassandra://
 192.168.1.103:9160/transactionlogdb
 at java.sql.DriverManager.getConnection(DriverManager.java:689)
 at java.sql.DriverManager.getConnection(DriverManager.java:247)
 at
 org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.getConnectionForReading(SimpleRiverSource.java:133)
 at
 org.xbib.elasticsearch.river.jdbc.strategy.simple.SimpleRiverSource.execute(SimpleRiverSource.java:271)

 However I have placed all the necessary jar files for the driver in
 $ES_HOME/plugins/jdbc. Please advice.

 Kind Regards
 Abhishek


 On Friday, June 13, 2014 1:43:45 PM UTC+5:30, Abhishek Mukherjee wrote:

 Hi Everyone,

 I am trying to move data from Cassandra to Elasticsearch. Initially I
 tried the cassandra-river at https://github.com/eBay/cassandra-river.
 However I got timed out error which I suspect was originating from the
 Hector API. I posted a question on ths thread https://groups.google.com/
 forum/#!searchin/elasticsearch/cassandra/elasticsearch/4oDbkqK3GVA/
 W9WLK4SS2MEJ.

 Moving on I thought of using the JDBC-river at
 https://github.com/jprante/elasticsearch-river-jdbc with a java driver
 for cassandra.  I followed the mysql example and modified it for cassandra.
 I created the river using as follows:

 curl -XPUT '192.168.1.103:9200/_river/my_jdbc_river/_meta' -d '{
  type : jdbc,
  jdbc : {
  url : jdbc:cassandra://192.168.1.105:9160/transactionlogdb,

  cql : select * from logs

  }
  }'


 {_index:_river,_type:my_jdbc_river,_id:_meta,
 _version:1,created:true}

 However I don't find any documents being created on the jdbc index. Am I
 missing something? Any help or tips is very much appreciated. Thanks is
 advance.

 Kind Regards,
 Abhishek Mukherjee

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/a5dc2bf7-2380-4425-893a-a44e587ba9ce%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/a5dc2bf7-2380-4425-893a-a44e587ba9ce%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to a topic in the
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit
 https://groups.google.com/d/topic/elasticsearch/iU_JRwxl6ZI/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGf1GjB8MSefx6ZC1OYD0b6Xf%3DKX%2BpDHcYe7cvZVmeyJg%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGf1GjB8MSefx6ZC1OYD0b6Xf%3DKX%2BpDHcYe7cvZVmeyJg%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAMjqjp4RYqhCP_BwGcA08WPtsc29AFj8UB%2Boi2tyTpY%2BPZouMg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: exclude some documents (and category filter combination) for some queries

2014-06-13 Thread Ivan Brusic
Currently not possible. Elasticsearch will return all the nested documents
as long as one of the nested documents satisfies the query.

https://github.com/elasticsearch/elasticsearch/issues/3022

The issue is my personal #1 feature requested. Frustrating considering
there has been a working implementation since version 0.90.5. 1.0, 1.1, 1.2
and still nothing.

-- 
Ivan




On Thu, Jun 12, 2014 at 2:17 PM, Srinivasan Ramaswamy ursva...@gmail.com
wrote:

 any thoughts anyone ?


 On Wednesday, June 11, 2014 11:15:18 PM UTC-7, Srinivasan Ramaswamy wrote:

 I would like to exclude some documents belonging to certain category from
 the results only for certain search queries. I have a ES client layer where
 i am thinking of implementing this logic as a not filter depending on the
 search query. Let me give an example.

 sample index

 designId: 100
 tags: [dog, cute]
 caption : cute dog in the garden
 products : [ { productId: 200, category: 1}, {productId: 201,
 category: 2} ]

 designId: 101
 tags: [brown, dog]
 caption :  little brown dog
 products : [ {productId: 202, category: 3} ]

 designId: 102
 tags: [black, dog]
 caption :  little black dog
 products : [ { productId: 202, category: 4}, {productId: 203,
 category: 5} ]

 products is a nested field inside each design.

 I would like to write a query to get all matches for dog, (not for
 other keywords) but filter out few categories from the result. As ES
 returns the whole nested document even if only one nested document matches
 the query, my expected result is

 designId: 100
 tags: [dog, cute]
 caption : cute dog in the garden
 products : [ { productId: 200, category: 1}, {productId: 201,
 category: 2} ]

 designId: 102
 tags: [black, dog]
 caption :  little black dog
 products : [ { productId: 202, category: 4}, {productId: 203,
 category: 5} ]
 Here is the query i tried but it doesn't work. Can anyone help me point
 out the mistake ?

 GET /_search/
 {
query: {
   filtered: {
  filter: {
   and: [
  {
  not: {
term: {
   category: 1
}
  }
  },
  {
  not: {
term: {
   category: 3
}
  }
  }
   ]

  },
  query: {
 multi_match: {
query: dog,
fields: [
   tags,
   caption
],
minimum_should_match: 50%
 }
  }
   }
}
 }

  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/45fbf85d-4d29-4222-a72a-bf0a04d9a26d%40googlegroups.com
 .

 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAfwARsZ7uGKkBf%2BH10jhrdw4dr5nxvHEK_FDUwQv%2BpQw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: issue of elasticsearch-hadoop-2.0.0 with Hive (cloudera and hortonworks), helps are needed

2014-06-13 Thread Costin Leau

Hi,

Sorry for the delayed response, travel and other things got in the way. I have tried replicating the issue on my end and 
couldn't; see below:


On 6/8/14 8:03 PM, elitem way wrote:

I am learning the elasticsearch-hadoop. I have a few issues that I do not 
understand. I am using ES 1.12 on Windows,
elasticsearch-hadoop-2.0.0 and cloudera-quickstart-vm-5.0.0-0-vmware sandbox 
with Hive.

1. I loaded only 6 rows to ES index car/transactions. Why did Hive return 14 
rows instead? See below.
2. select count(*) from cars2 failed with code 2. Group by, sum also 
failed. Did I miss anything. The similar
query are successful when using sample_07 and sample_08 tables that come with 
Hive.
3.  elasticsearch-hadoop-2.0.0 does seem to work with jetty - the 
authentication plugin. I got errors when I enable
jetty and set 'es.nodes' = 'superuser:admin@192.168.128.1'
4. I could not pipe data from Hive to ElasticSearch either.

*--ISSUE 1*:
--load data to ES
­ POST: http://localhost:9200/cars/transactions/_bulk
{ index: {}}
{ price : 3, color : green, make : ford, sold : 2014-05-18 }
{ index: {}}
{ price : 15000, color : blue, make : toyota, sold : 2014-07-02 }
{ index: {}}
{ price : 12000, color : green, make : toyota, sold : 2014-08-19 }
{ index: {}}
{ price : 2, color : red, make : honda, sold : 2014-11-05 }
{ index: {}}
{ price : 8, color : red, make : bmw, sold : 2014-01-01 }
{ index: {}}
{ price : 25000, color : blue, make : ford, sold : 2014-02-12 }

CREATE EXTERNAL TABLE cars2 (color STRING, make STRING, price BIGINT, sold 
TIMESTAMP)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'cars/transactions',
'es.nodes' = '192.168.128.1', 'es.port'='9200');

HIVE: select * from cars2;
14 rows returned.

   color make price sold
0 red honda 2 2014-11-05 00:00:00.0
1 red honda 1 2014-10-28 00:00:00.0
2 green ford 3 2014-05-18 00:00:00.0
3 green toyota 12000 2014-08-19 00:00:00.0
4 blue ford 25000 2014-02-12 00:00:00.0
5 blue toyota 15000 2014-07-02 00:00:00.0
6 red bmw 8 2014-01-01 00:00:00.0
7 red honda 1 2014-10-28 00:00:00.0
8 blue toyota 15000 2014-07-02 00:00:00.0
9 red honda 2 2014-11-05 00:00:00.0
10 green ford 3 2014-05-18 00:00:00.0
11 green toyota 12000 2014-08-19 00:00:00.0
12 red honda 2 2014-11-05 00:00:00.0
13 red honda 2 2014-11-05 00:00:00.0
14 red bmw 8 2014-01-01 00:00:00.0




It looks like you are adding data to localhost:9200 but querying on 192.168.128.1:9200 - most likely they are different, 
hence
the different data set. To double check, do a query/count through curl on ES and then check the data through Hive - 
that's what we do in our tests.



*ISSUE2:*

HIVE: select count(*) from cars2;

Your query has the following error(s):
Error while processing statement: FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask



Again since you are querying a different host it's hard to tell what's the issue. count(*) works in our tests but I've 
seen cases where count fails when dealing the newly introduced types (like timestamp). You can use count(1) as an 
alternative which should work just fine.


*--ISSUE 4:*

CREATE EXTERNAL TABLE test1 (
 description STRING)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.host' = '192.168.128.1', 'es.port'='9200', 'es.resource' = 
'test1');

INSERT OVERWRITE TABLE test1 select description from sample_07;

Your query has the following error(s):

Error while processing statement: FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask



That is because you have an invalid table definition; the resource needs to point to a index/type not just an index - 
if you look deep into the Hive exception, you should be able to see the actual validation message. Since Hive executes 
things lazily and on the server side, there's no other way of reporting the error to the user...


Hope this helps,


--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to
elasticsearch+unsubscr...@googlegroups.com 
mailto:elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8c642665-424a-48be-bc5d-8625b94243c0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8c642665-424a-48be-bc5d-8625b94243c0%40googlegroups.com?utm_medium=emailutm_source=footer.
For more options, visit https://groups.google.com/d/optout.


--
Costin

--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/539B3BFF.509%40gmail.com.
For more 

Re: Kibana 3 and changing the default field from _all to message

2014-06-13 Thread Brian
Ok, it's not a Kibana issue, but my Elasticsearch configuration issue. I 
could fix it in the elasticsearch.yml file, but I believe it's much safer 
to fix it in my less-likely-to-be-altered start-up script wrapper.

So now when I start ES via the bin/elasticsearch script, but only on behalf 
of the ELK stack, I add the following option to the command line:

-Des.index.query.default_field=message

And now, my default field for a Kibana (Lucene) query is message and not 
_all.

And _all is well (pun intended!).

Brian

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/84b63fe8-523b-43f4-8522-6b8d392ff63c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Securing Data in Elasticsearch

2014-06-13 Thread Harvii Dent
ES nodes would be locked down and accessible only to authorized users on 
the OS level; it's the ability to delete and update indices/documents 
remotely that's worrisome in this case.
 
Disabling HTTP REST API completely is not possible since it's required by 
Kibana (running behind a reverse proxy), although I suppose I could 
restrict the ES node to only accept traffic from Logstash on port 9300 and 
from the reverse proxy on port 9200, would this provide sufficient 
protection? 

Thanks

On Thursday, June 12, 2014 6:44:33 PM UTC+3, Jörg Prante wrote:

 If you want ES-level security, you should first reduce attack vectors, by 
 closing down all the open ports and resources that are not necessary.

 One step would be to disable HTTP REST API completely (port 9200) and run 
 Logstash Elasticsearch output only  
 http://logstash.net/docs/1.4.1/outputs/elasticsearch

 As a consequence, you could only kill the ES process on a node, or send 
 Java API commands. It is not possible to block Java API commands over port 
 9300, this is how nodes talk to each other. You could imagine a 
 self-written tool for administering your cluster that uses the Java API 
 only (from a J2EE web app for example)

 On the node on OS level, you would have to protect the OS user of ES node 
 is running under from being accessed by third party users.

 Jörg



 On Thu, Jun 12, 2014 at 5:30 PM, Harvii Dent harvi...@gmail.com 
 javascript: wrote:

 ES settings alone would be great, are there other options that I could 
 have missed? right now the main priority is preventing document 
 updates/deletes (and index deletes) via the ES rest api.

 Thanks


 On Thursday, June 12, 2014 6:21:36 PM UTC+3, Jörg Prante wrote:

 There are a lot of methods to tamper with ES files, and physically, 
 everything is possible to modify in files as long as your operating system 
 permits more than something like append-only mode for ES files (not that 
 I know this would work)

 So it depends on your requirements about the security level you want to 
 reach, if ES settings alone can help you or if you need more (paranoid) 
 configurations.

 Jörg
  

 On Thu, Jun 12, 2014 at 4:48 PM, Harvii Dent harvi...@gmail.com wrote:

  Hello,

 I'm planning to use Elasticsearch with Logstash for logs management and 
 search, however, one thing I'm unable to find an answer for is making sure 
 that the data cannot be modified once it reaches Elasticsearch.

 action.destructive_requires_name prevents deleting all indices at 
 once, but they can still be deleted. Are there any options to prevent 
 deleting indices altogether? 

 And on the document level, is it possible to disable 'delete' *AND* 
 'update' operations without setting the entire index as read-only (ie. 
 'index.blocks.read_only')?

 Lastly, does setting 'index.blocks.read_only' ensure that the index 
 files on disk are not changed (so they can be monitored using a file 
 integrity monitoring solution)? as many regulatory and compliance bodies 
 have requirements for ensuring logs integrity.

 Thanks

  -- 
 You received this message because you are subscribed to the Google 
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to elasticsearc...@googlegroups.com.

 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/dfc73db4-18ac-405e-8929-68be32b01a6c%
 40googlegroups.com 
 https://groups.google.com/d/msgid/elasticsearch/dfc73db4-18ac-405e-8929-68be32b01a6c%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearc...@googlegroups.com javascript:.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/190a707b-9edf-4128-9740-79d59f0bc209%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/190a707b-9edf-4128-9740-79d59f0bc209%40googlegroups.com?utm_medium=emailutm_source=footer
 .

 For more options, visit https://groups.google.com/d/optout.




-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9339cfd0-9300-496e-bc00-4179725e02db%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


elasticsearch-php auth error

2014-06-13 Thread Patrick Marx
Hi,
I've been using the php client successfully with a remote server, and I've 
set up a new server and run into auth problems using the PHP client 
library. 

$clientParams['connectionParams']['auth'] = array(
'user',
'pw',
'Basic'
);

My issue is now I get back a 401 Authentication Required every time I try 
to hit the endpoints using the PHP client but I've gone on the chrome 
extension Postman to try sending some basic auth requests using the user/pw 
and the server responds correctly. Any ideas what might cause this or how 
to go about debugging? 

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8fc0743b-d3a9-4b4c-9484-deb647707afd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: issue of elasticsearch-hadoop-2.0.0 with Hive (cloudera and hortonworks), helps are needed

2014-06-13 Thread Elitemway
Thank you for the response. The localhost and 192.168.128.1 are the actually 
the same ES host. I installed ES cloudera vm on xp. I will try your suggestion 
though and report back. I will try the table without timestamp column.

Sent from my iPhone

 On Jun 13, 2014, at 1:59 PM, Costin Leau costin.l...@gmail.com wrote:
 
 Hi,
 
 Sorry for the delayed response, travel and other things got in the way. I 
 have tried replicating the issue on my end and couldn't; see below:
 
 On 6/8/14 8:03 PM, elitem way wrote:
 I am learning the elasticsearch-hadoop. I have a few issues that I do not 
 understand. I am using ES 1.12 on Windows,
 elasticsearch-hadoop-2.0.0 and cloudera-quickstart-vm-5.0.0-0-vmware sandbox 
 with Hive.
 
 1. I loaded only 6 rows to ES index car/transactions. Why did Hive return 14 
 rows instead? See below.
 2. select count(*) from cars2 failed with code 2. Group by, sum also 
 failed. Did I miss anything. The similar
 query are successful when using sample_07 and sample_08 tables that come 
 with Hive.
 3.  elasticsearch-hadoop-2.0.0 does seem to work with jetty - the 
 authentication plugin. I got errors when I enable
 jetty and set 'es.nodes' = 'superuser:admin@192.168.128.1'
 4. I could not pipe data from Hive to ElasticSearch either.
 
 *--ISSUE 1*:
 --load data to ES
 POST: http://localhost:9200/cars/transactions/_bulk
 { index: {}}
 { price : 3, color : green, make : ford, sold : 2014-05-18 
 }
 { index: {}}
 { price : 15000, color : blue, make : toyota, sold : 
 2014-07-02 }
 { index: {}}
 { price : 12000, color : green, make : toyota, sold : 
 2014-08-19 }
 { index: {}}
 { price : 2, color : red, make : honda, sold : 2014-11-05 }
 { index: {}}
 { price : 8, color : red, make : bmw, sold : 2014-01-01 }
 { index: {}}
 { price : 25000, color : blue, make : ford, sold : 2014-02-12 }
 
 CREATE EXTERNAL TABLE cars2 (color STRING, make STRING, price BIGINT, sold 
 TIMESTAMP)
 STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
 TBLPROPERTIES('es.resource' = 'cars/transactions',
 'es.nodes' = '192.168.128.1', 'es.port'='9200');
 
 HIVE: select * from cars2;
 14 rows returned.
 
   color make price sold
 0 red honda 2 2014-11-05 00:00:00.0
 1 red honda 1 2014-10-28 00:00:00.0
 2 green ford 3 2014-05-18 00:00:00.0
 3 green toyota 12000 2014-08-19 00:00:00.0
 4 blue ford 25000 2014-02-12 00:00:00.0
 5 blue toyota 15000 2014-07-02 00:00:00.0
 6 red bmw 8 2014-01-01 00:00:00.0
 7 red honda 1 2014-10-28 00:00:00.0
 8 blue toyota 15000 2014-07-02 00:00:00.0
 9 red honda 2 2014-11-05 00:00:00.0
 10 green ford 3 2014-05-18 00:00:00.0
 11 green toyota 12000 2014-08-19 00:00:00.0
 12 red honda 2 2014-11-05 00:00:00.0
 13 red honda 2 2014-11-05 00:00:00.0
 14 red bmw 8 2014-01-01 00:00:00.0
 
 
 
 It looks like you are adding data to localhost:9200 but querying on 
 192.168.128.1:9200 - most likely they are different, hence
 the different data set. To double check, do a query/count through curl on ES 
 and then check the data through Hive - that's what we do in our tests.
 
 *ISSUE2:*
 
 HIVE: select count(*) from cars2;
 
 Your query has the following error(s):
 Error while processing statement: FAILED: Execution Error, return code 2 
 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
 
 
 Again since you are querying a different host it's hard to tell what's the 
 issue. count(*) works in our tests but I've seen cases where count fails when 
 dealing the newly introduced types (like timestamp). You can use count(1) as 
 an alternative which should work just fine.
 
 *--ISSUE 4:*
 
 CREATE EXTERNAL TABLE test1 (
 description STRING)
 STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
 TBLPROPERTIES('es.host' = '192.168.128.1', 'es.port'='9200', 'es.resource' = 
 'test1');
 
 INSERT OVERWRITE TABLE test1 select description from sample_07;
 
 Your query has the following error(s):
 
 Error while processing statement: FAILED: Execution Error, return code 2 
 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
 
 
 That is because you have an invalid table definition; the resource needs to 
 point to a index/type not just an index - if you look deep into the Hive 
 exception, you should be able to see the actual validation message. Since 
 Hive executes things lazily and on the server side, there's no other way of 
 reporting the error to the user...
 
 Hope this helps,
 
 --
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to
 elasticsearch+unsubscr...@googlegroups.com 
 mailto:elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/8c642665-424a-48be-bc5d-8625b94243c0%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/8c642665-424a-48be-bc5d-8625b94243c0%40googlegroups.com?utm_medium=emailutm_source=footer.

Re: ingest performance degrades sharply along with the documents having more fileds

2014-06-13 Thread Cindy Hsin
Hi, Mark:

We are doing single document ingestion. We did a performance comparison 
between Solr and Elastic Search (ES).
The performance for ES degrades dramatically when we increase the metadata 
fields where Solr performance remains the same. 
The performance is done in very small data set (ie. 10k documents, the 
index size is only 75mb). The machine is a high spec machine with 48GB 
memory.
You can see ES performance drop 50% even when the machine have plenty 
memory. ES consumes all the machine memory when metadata field increased to 
100k. 
This behavior seems abnormal since the data is really tiny.

We also tried with larger data set (ie. 100k and 1Mil documents), ES throw 
OOW for scenario 2 for 1 Mil doc scenario. 
We want to know whether this is a bug in ES and/or is there any workaround 
(config step) we can use to eliminate the performance degradation. 
Currently ES performance does not meet the customer requirement so we want 
to see if there is anyway we can bring ES performance to the same level as 
Solr.

Below is the configuration setting and benchmark results for 10k document 
set.
scenario 0 means there are 1000 different metadata fields in the system.
scenario 1 means there are 10k different metatdata fields in the system.
scenario 2 means there are 100k different metadata fields in the system.
scenario 3 means there are 1M different metadata fields in the system.

   - disable hard-commit  soft commit + use a *client* to do commit (ES  
   Solr) every 10 second
   - ES: flush, refresh are disabled
  - Solr: autoSoftCommit are disabled
   - monitor load on the system (cpu, memory, etc) or the ingestion speed 
   change over time
   - monitor the ingestion speed (is there any degradation over time?)
   - new ES config:new_ES_config.sh 
   
https://stbeehive.oracle.com/content/dav/st/Cloud%20Search/Documents/new_ES_config.sh;
 
   new ingestion: new_ES_ingest_threads.pl 
   
https://stbeehive.oracle.com/content/dav/st/Cloud%20Search/Documents/new_ES_ingest_threads.pl
   - new Solr ingestion: new_Solr_ingest_threads.pl 
   
https://stbeehive.oracle.com/content/dav/st/Cloud%20Search/Documents/new_Solr_ingest_threads.pl
   - flush interval: 10s


Number of different meta data fieldESSolrScenario 0: 100012secs - 
833docs/sec
CPU: 30.24%
Heap: 1.08G
time(secs) for each 1k docs:3 1 1 1 1 1 0 1 2 1
index size: 36M
iowait: 0.02%13 secs - 769 docs/sec
CPU: 28.85%
Heap: 9.39G
time(secs) for each 1k docs: 2 1 1 1 1 1 1 1 2 2Scenario 1: 10k29secs - 
345docs/sec
CPU: 40.83%
Heap: 5.74G
time(secs) for each 1k docs:14 2 2 2 1 2 2 1 2 1
iowait: 0.02%
Index Size: 36M12 secs - 833 docs/sec
CPU: 28.62%
Heap: 9.88G
time(secs) for each 1k docs:1 1 1 1 2 1 1 1 1 2Scenario 2: 100k17 mins 44 
secs - 9.4docs/sec
CPU: 54.73%
Heap: 47.99G
time(secs) for each 1k docs:97 183 196 147 109 89 87 49 66 40
iowait: 0.02%
Index Size: 75M13 secs - 769 docs/sec
CPU: 29.43%
Heap: 9.84G
time(secs) for each 1k docs:2 1 1 1 1 1 1 1 2 2Scenario 3: 1M183 mins 8 
secs - 0.9 docs/sec
CPU: 40.47%
Heap: 47.99G
time(secs) for each 1k docs:133 422 701 958 989 1322 1622 1615 1630 159415 
secs - 666.7 docs/sec
CPU: 45.10%
Heap: 9.64G
time(secs) for each 1k docs:2 1 1 1 1 2 1 1 3 2

Thanks!
Cindy

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4efc9c2d-ead4-4702-896d-dc32b5867859%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


HIVE-Elasticsearch [mapr-elasticsearch] write to elasticsearch issue

2014-06-13 Thread shankarramshivram
Hi ,

I am trying to integrate elasticsearch with a mapr hadoop cluster. I am 
using the hive-elasticsearch integration document. I am able to read data 
from the elasticsearch node. However I am not able to write data into the 
elasticsearch node which is my primary requirement. Request to kindly guide 
me .

I always get the following errors:- 

2014-06-13 14:15:45,814 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 
New Final Path: FS 
maprfs:/user/hive/warehouse/dev.db/_tmp.shankar/02_0*2014-06-13 
14:15:45,947 FATAL org.apache.hadoop.hive.ql.exec.mr.ExecMapper: 
java.lang.NoSuchMethodError: 
org.codehaus.jackson.JsonGenerator.writeUTF8String([BII)V
at 
org.elasticsearch.hadoop.serializ*ation.json.JacksonJsonGenerator.writeUTF8String(JacksonJsonGenerator.java:123)
at 
org.elasticsearch.hadoop.mr.WritableValueWriter.write(WritableValueWriter.java:47)
at 
org.elasticsearch.hadoop.hive.HiveWritableValueWriter.write(HiveWritableValueWriter.java:83)
at 
org.elasticsearch.hadoop.hive.HiveWritableValueWriter.write(HiveWritableValueWriter.java:38)
at 
org.elasticsearch.hadoop.hive.HiveValueWriter.write(HiveValueWriter.java:69)
at 
org.elasticsearch.hadoop.hive.HiveValueWriter.write(HiveValueWriter.java:111)
at 
org.elasticsearch.hadoop.hive.HiveValueWriter.write(HiveValueWriter.java:55)
at 
org.elasticsearch.hadoop.hive.HiveValueWriter.write(HiveValueWriter.java:41)
at 
org.elasticsearch.hadoop.serialization.builder.ContentBuilder.value(ContentBuilder.java:258)
at 
org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.doWriteObject(TemplatedBulk.java:92)
at 
org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.write(TemplatedBulk.java:79)
at org.elasticsearch.hadoop.hive.EsSerDe.serialize(EsSerDe.java:128)
at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:582)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:348)
at org.apache.hadoop.mapred.Child$4.run(Child.java:282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1117)
at org.apache.hadoop.mapred.Child.main(Child.java:271)

2014-06-13 14:15:45,947 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 3 
finished. closing... 
2014-06-13 14:15:45,947 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 
DESERIALIZE_ERRORS:0
2014-06-13 14:15:45,947 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 
0 finished. closing... 
2014-06-13 14:15:45,947 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 
finished. closing... 
2014-06-13 14:15:45,947 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 2 
finished. closing... 
2014-06-13 14:15:45,948 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 2 
Close done
2014-06-13 14:15:45,948 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 
Close done
2014-06-13 14:15:45,948 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 
0 Close done
2014-06-13 14:15:45,948 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 3 
Close done
2014-06-13 14:15:45,948 INFO org.apache.hadoop.hive.ql.exec.mr.ExecMapper: 
ExecMapper: processed 0 rows: used memory = 9514320
2014-06-13 14:15:45,992 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2014-06-13 14:15:46,024 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.RuntimeException: java.lang.NoSuchMethodError: 
org.codehaus.jackson.JsonGenerator.writeUTF8String([BII)V
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:348)
at org.apache.hadoop.mapred.Child$4.run(Child.java:282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 

Re: Securing Data in Elasticsearch

2014-06-13 Thread joergpra...@gmail.com
You should start HTTP only on localhost then and run Kibana on a selected
number of nodes only.

There are some authentication solutions for Kibana.

I am not able to find security features like audit trails or preventing
writes in Kibana/ES so you have to take care. Assessing Kibana for attacks
over the web (intrusion detection, executing commands etc) is useful, I
don't know if anyone has tried such a thing, but it is a very complex task.

Because this variant is tedious and maybe not successful, I would opt for a
different approach. Keep a checksummed copy of an index at a safe
restricted place on a private ES cluster (or burn it even to optical
media) and rsync a copy of it to an unsafe place, to another public ES
cluster where Kibana runs. Checksum verification can prove if index is
modified in the meantime at the public place.

Jörg



On Fri, Jun 13, 2014 at 8:18 PM, Harvii Dent harviid...@gmail.com wrote:

 ES nodes would be locked down and accessible only to authorized users on
 the OS level; it's the ability to delete and update indices/documents
 remotely that's worrisome in this case.

 Disabling HTTP REST API completely is not possible since it's required by
 Kibana (running behind a reverse proxy), although I suppose I could
 restrict the ES node to only accept traffic from Logstash on port 9300 and
 from the reverse proxy on port 9200, would this provide sufficient
 protection?

 Thanks

 On Thursday, June 12, 2014 6:44:33 PM UTC+3, Jörg Prante wrote:

 If you want ES-level security, you should first reduce attack vectors, by
 closing down all the open ports and resources that are not necessary.

 One step would be to disable HTTP REST API completely (port 9200) and run
 Logstash Elasticsearch output only  http://logstash.net/docs/1.4.
 1/outputs/elasticsearch

 As a consequence, you could only kill the ES process on a node, or send
 Java API commands. It is not possible to block Java API commands over port
 9300, this is how nodes talk to each other. You could imagine a
 self-written tool for administering your cluster that uses the Java API
 only (from a J2EE web app for example)

 On the node on OS level, you would have to protect the OS user of ES node
 is running under from being accessed by third party users.

 Jörg



 On Thu, Jun 12, 2014 at 5:30 PM, Harvii Dent harvi...@gmail.com wrote:

 ES settings alone would be great, are there other options that I could
 have missed? right now the main priority is preventing document
 updates/deletes (and index deletes) via the ES rest api.

 Thanks


 On Thursday, June 12, 2014 6:21:36 PM UTC+3, Jörg Prante wrote:

 There are a lot of methods to tamper with ES files, and physically,
 everything is possible to modify in files as long as your operating system
 permits more than something like append-only mode for ES files (not that
 I know this would work)

 So it depends on your requirements about the security level you want to
 reach, if ES settings alone can help you or if you need more (paranoid)
 configurations.

 Jörg


 On Thu, Jun 12, 2014 at 4:48 PM, Harvii Dent harvi...@gmail.com
 wrote:

  Hello,

 I'm planning to use Elasticsearch with Logstash for logs management
 and search, however, one thing I'm unable to find an answer for is making
 sure that the data cannot be modified once it reaches Elasticsearch.

 action.destructive_requires_name prevents deleting all indices at
 once, but they can still be deleted. Are there any options to prevent
 deleting indices altogether?

 And on the document level, is it possible to disable 'delete' *AND*
 'update' operations without setting the entire index as read-only (ie.
 'index.blocks.read_only')?

 Lastly, does setting 'index.blocks.read_only' ensure that the index
 files on disk are not changed (so they can be monitored using a file
 integrity monitoring solution)? as many regulatory and compliance bodies
 have requirements for ensuring logs integrity.

 Thanks

  --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.

 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/dfc73db4-18ac-405e-8929-68be32b01a6c%40goo
 glegroups.com
 https://groups.google.com/d/msgid/elasticsearch/dfc73db4-18ac-405e-8929-68be32b01a6c%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google
 Groups elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to elasticsearc...@googlegroups.com.
 To view this discussion on the web visit https://groups.google.com/d/
 msgid/elasticsearch/190a707b-9edf-4128-9740-79d59f0bc209%
 40googlegroups.com
 

Linear Scaling with ES

2014-06-13 Thread pranav amin
Hi,

We have been spending considerable amount of time now just to figure out if 
we can get linear scaling in ES by increasing number of nodes or shards or 
some other parameters. We did so many experiments, changing shards, 
changing nodes, changing replica, etc but looks to me with everything we 
were hitting a limit.

I know this is a very broad question i'm asking, but does anyone know if it 
is even possible? Is there any formula or magic mantra to achieve this.

Thank a lot in advance if someone can answer this. It can save me some 
time. 

Thanks
Pranav.

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d0122c86-f8d8-4e5a-bf07-fa225c2b787c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


index.cache.filter.type

2014-06-13 Thread Nate Fox
I'm toying with the effects of different settings and noticed that setting
`index.cache.filter.type: none` works fine, but setting
`index.cache.filter.type: soft` or `index.cache.filter.type: weak` gives me
stack traces.
Am I doing it wrong? The docs mention soft, weak and resident being the
type's available.
I'm running ES v1.1.0


org.elasticsearch.indices.IndexCreationException:
[centrallogging_awseast-2014-06-13] failed to create index
at
org.elasticsearch.indices.InternalIndicesService.createIndex(InternalIndicesService.java:300)
at
org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewIndices(IndicesClusterStateService.java:307)
at
org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:179)
at
org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:424)
at
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:134)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.common.settings.NoClassSettingsException:
Failed to load class setting [index.cache.filter.type] with value [soft]
at
org.elasticsearch.common.settings.ImmutableSettings.loadClass(ImmutableSettings.java:448)
at
org.elasticsearch.common.settings.ImmutableSettings.getAsClass(ImmutableSettings.java:436)
at
org.elasticsearch.index.cache.filter.FilterCacheModule.configure(FilterCacheModule.java:44)
at
org.elasticsearch.common.inject.AbstractModule.configure(AbstractModule.java:60)
at
org.elasticsearch.index.cache.IndexCacheModule.configure(IndexCacheModule.java:41)
at
org.elasticsearch.common.inject.AbstractModule.configure(AbstractModule.java:60)
at
org.elasticsearch.common.inject.spi.Elements$RecordingBinder.install(Elements.java:204)
at
org.elasticsearch.common.inject.spi.Elements.getElements(Elements.java:85)
at
org.elasticsearch.common.inject.InjectorShell$Builder.build(InjectorShell.java:130)
at
org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:99)
at
org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:131)
at
org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:69)
at
org.elasticsearch.indices.InternalIndicesService.createIndex(InternalIndicesService.java:298)
... 7 more
Caused by: java.lang.ClassNotFoundException:
org.elasticsearch.index.cache.filter.soft.SoftFilterCache
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at
org.elasticsearch.common.settings.ImmutableSettings.loadClass(ImmutableSettings.java:446)
... 19 more

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAHU4sP96QM3Rf5SF7Pd7tJvOib4dZyW9yEHJ_XgD6ZuApvbUNw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Showing stats from delete operation

2014-06-13 Thread jblum
Would it be possible to add some stats to the response from a DeleteByQuery 
giving information on how my objects were deleted?

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5afa5f13-88e9-481c-ba13-8f6343fd7023%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Linear Scaling with ES

2014-06-13 Thread Mark Walkom
The answer is - it depends.

If you can provide a bit more detail on what you've done, your setup etc,
maybe someone can provide more assistance.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 14 June 2014 07:48, pranav amin parulpate...@gmail.com wrote:

 Hi,

 We have been spending considerable amount of time now just to figure out
 if we can get linear scaling in ES by increasing number of nodes or shards
 or some other parameters. We did so many experiments, changing shards,
 changing nodes, changing replica, etc but looks to me with everything we
 were hitting a limit.

 I know this is a very broad question i'm asking, but does anyone know if
 it is even possible? Is there any formula or magic mantra to achieve this.

 Thank a lot in advance if someone can answer this. It can save me some
 time.

 Thanks
 Pranav.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/d0122c86-f8d8-4e5a-bf07-fa225c2b787c%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/d0122c86-f8d8-4e5a-bf07-fa225c2b787c%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624YrsEVU9CPjj_vKWA4uExwPxjr6EvkyxrnwfG2S3TudiQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Showing stats from delete operation

2014-06-13 Thread Mark Walkom
You will need to raise a github request for this.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 14 June 2014 08:41, jb...@locu.com wrote:

 Would it be possible to add some stats to the response from a
 DeleteByQuery giving information on how my objects were deleted?

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/5afa5f13-88e9-481c-ba13-8f6343fd7023%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/5afa5f13-88e9-481c-ba13-8f6343fd7023%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624Y79gMUxTDZaCisMWQJ6wMgZGci82wuxB2y-y8Ofs%3DmzA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: RepositoryMissingException

2014-06-13 Thread Shawn Mullen
Well,  that was it.  I copied the sample PUT from the elasticsearch web 
site, which of course uses curl, and did not take out the -d.  Definitely 
helps to have another pair of eyes.  I was looking at that all day and 
didn't see the -d.  Thanks for your help.

Shawn

On Friday, June 13, 2014 5:35:45 AM UTC-6, Shawn Mullen wrote:

 good question.  that is what is being returned when I make the call.  but 
 your question gave me an idea as to what the problem is. 

 thanks.
 On Jun 12, 2014 11:32 PM, David Pilato wrote:

 What is this -d in statlogs -d?

 --
 David ;-)
 Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs


 Le 13 juin 2014 à 03:58, Shawn Mullen 

 I have an ElasticSearch instance running on my local machine.  I 
 installed the S3 plugin so I can do backup and restore operation to/from S3.

 I tried to follow the documentation on how to set this up.  I was able to 
 register a snapshot repository and I have a bucket in S3 created just for 
 backups.  When I do a /_all I see the current repo settings. So, at this 
 point all looks fine.  However, when I try to create a snapshot it fails 
 with RepositoryMissingException.

 This is what I get for a /_all:

 {
statlogs -d: {
   type: s3,
   settings: {
  region: us-east,
  bucket: my-bucket-name,
  access_key: my-access-key,
  secret_key: my-secret-key
   }
}
 }

 This is what I am sending when I try to do a snapshot:

 PUT /_snapshot/statlogs/snapshot_1 -d 
 {
 indices: [statexceptionlog],
 ignore_unavailable: true,
 include_global_state: false
 }

 I am using Sense to send the commands.

 I'm assuming I am getting the error because of something wrong with my S3 
 settings but I don't know what it would be.  I'm making this assumption 
 because the /_all returns data (but I guess that could be wrong).  Any 
 ideas on what the issue might be?  What exactly causes 
 RepositoryMissingException? 

 Thanks. 


  -- 
 You received this message because you are subscribed to the Google Groups 
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an 
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/ac375bf8-6f78-41a2-9cb0-0409234a004f%40googlegroups.com
  
 https://groups.google.com/d/msgid/elasticsearch/ac375bf8-6f78-41a2-9cb0-0409234a004f%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.
  
 -- 
 You received this message because you are subscribed to a topic in the 
 Google Groups elasticsearch group.
 To unsubscribe from this topic, visit 
 https://groups.google.com/d/topic/elasticsearch/-av1rGK1bQE/unsubscribe.
 To unsubscribe from this group and all its topics, send an email to 
 elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/elasticsearch/36BC1F5F-1D3A-4E56-A519-C1952C9C0D80%40pilato.fr
  
 https://groups.google.com/d/msgid/elasticsearch/36BC1F5F-1D3A-4E56-A519-C1952C9C0D80%40pilato.fr?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.



-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c8375d28-016e-4efb-9bcb-3ece6110f053%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


better places to store es.nodes and es.port in ES Hive integration?

2014-06-13 Thread Jinyuan Zhou
Hi, 
I am playing with elasticsearch and hive integration. The documentation 
says 
to set configuration like es.nodes, es.port  in TBLPROPERTIES. It works. 
But it can cause many reduntant codes. If I have ten data set to index to 
the same es cluster,
 I would have to repeat this information ten times in TBLPROPERTIES. Even 
if 
 I use var substitution I still have to rwrite this subtititiov var for 
 each table definition. 
What I am looking for is to put these info in say one file and  pass the 
location, in some way, to hive cli
so hive elasticsearch will get these settings when trying to find es server 
to talk to.
I am not looking into put these info into files like  hive-site.xml. 

Thanks,

Jack

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7040c805-e845-4b3d-a9fe-5e18d8445f7f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Changing Kibana-int based on context

2014-06-13 Thread Mark Walkom
I don't think you can do this dynamically within kibana. The better way
would be to run multiple instances of KB and then use a proxy to handle the
redirects.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 14 June 2014 00:43, mysterydark diyabi...@gmail.com wrote:

  I am a newbie to Computer science in general and at present I am working
 on a project which involves Elasticsearch, logstash, and Kibana and we are
 using this to build up a centralized Logging system. In kibana config.js ,
 there is a parameter kibana_index whose default value is set to
 Kibana-int. Is there a way possible to change the value of Kibana-index
 based on the context? What I could understand from my research is that
 kibana-int is the index which stores all the dashboards. When I say
 context, what I mean is if I have multiple projects in an organization, the
 dropdown on kibana dashboard page should show the dashboards only under a
 particular project when I give that project's name as the context in my
 url. So people working in a project get to see only the ones in their
 project. The only way I could find is to change the kibana-index value
 based on the project say something like kibana-projA. So it shows all the
 dashboards under this particular index. But I couldnt find a way as to how
 to do it. Could you please help me out.

 Any help would be appreciated.

 Thanks.

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/f8ff596f-7813-48b6-9fab-65d075aaebcd%40googlegroups.com
 https://groups.google.com/d/msgid/elasticsearch/f8ff596f-7813-48b6-9fab-65d075aaebcd%40googlegroups.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624ZKf0ZHiJdnySNdjYPqgPb3qJ3Y%3D0W%2BgmCTVucmKveamw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


elasticsearch curator — version 1.1.0 released

2014-06-13 Thread Mark Walkom
http://www.elasticsearch.org/blog/elasticsearch-curator-version-1-1-0-released/

When Elasticsearch version 1.0.0 was released, it came with a new feature:
 Snapshot  Restore. The Snapshot portion of this feature allows you to
 create backups by taking a “picture” of your indices at a particular point
 in time. Soon after this announcement, the feature requests began to
 accumulate. Things like, “Add snapshots to Curator!” or “When will Curator
 be able to do snapshots?” If this has been your desire, your wish has
 finally been granted…and much, much more in addition!


There looks to be a whole heap of cool stuff added, snapshots, aliases,
allocation routing and more!

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624axebWs4N43N_6aOZCPaBGOzLhdyHKWQKTnZMgDPWNpWw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: elasticsearch curator — version 1.1.0 released

2014-06-13 Thread Mark Walkom
It has a prefix setting, but not a suffix.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 14 June 2014 13:35, Ivan Brusic i...@brusic.com wrote:

 The addition of the snapshot feature is interesting, but I just wish there
 was a way to specify the index names instead of just specifying the dates.
 I haven't downloading it yet, but it does have a prefix setting. I need a
 suffix setting.

 --
 Ivan


 On Fri, Jun 13, 2014 at 5:38 PM, Mark Walkom ma...@campaignmonitor.com
 wrote:


 http://www.elasticsearch.org/blog/elasticsearch-curator-version-1-1-0-released/

 When Elasticsearch version 1.0.0 was released, it came with a new
 feature: Snapshot  Restore. The Snapshot portion of this feature allows
 you to create backups by taking a “picture” of your indices at a particular
 point in time. Soon after this announcement, the feature requests began to
 accumulate. Things like, “Add snapshots to Curator!” or “When will Curator
 be able to do snapshots?” If this has been your desire, your wish has
 finally been granted…and much, much more in addition!


 There looks to be a whole heap of cool stuff added, snapshots, aliases,
 allocation routing and more!

 Regards,
 Mark Walkom

 Infrastructure Engineer
 Campaign Monitor
 email: ma...@campaignmonitor.com
 web: www.campaignmonitor.com

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CAEM624axebWs4N43N_6aOZCPaBGOzLhdyHKWQKTnZMgDPWNpWw%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CAEM624axebWs4N43N_6aOZCPaBGOzLhdyHKWQKTnZMgDPWNpWw%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


  --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCL36f_Wz6zKDg4SL8wmt%3DEsQXasQOnn_32SgO%3DosPofQ%40mail.gmail.com
 https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCL36f_Wz6zKDg4SL8wmt%3DEsQXasQOnn_32SgO%3DosPofQ%40mail.gmail.com?utm_medium=emailutm_source=footer
 .
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624YjkRv9M%3DqXfREiKsqM38KCBHN_xj6nLbRENHyFwCDk1A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.