Elasticsearch lareg heap usage.

2015-05-20 Thread John Smith
Hi, running...

ES 1.5.2
Java 1.8_45
Windows 2008
4 nodes of: 32 cores 128GB 5 TB SSDs (each)

ES_HEAP_SIZE configured to 30g


I just finished bulk indexing 350,000,000 records but all 4 nodes are 
consumming 60% heap usage and not collecting. I have your kit running on 1 
node and I tried forcing GC but nothing went down.
I know disable explicit GC is turned on in the .bat files but with your kit 
I'm still able to force collection. I was able to collect some memory 
before.

Here is what is in the logs when I force GC from yourkit...

[ES xxx 01-01 ()] [gc][young][87406][39154] duration [1.6s], 
collections [1]/[2.5s], total [1.6s]/[17h], memory 
[18.1gb]->[17.1gb]/[30gb], all_pools {[young] 
[1gb]->[32mb]/[0b]}{[survivor] [112mb]->[128mb]/[0b]}{[old] 
[16.9gb]->[16.9gb]/[30gb]}
[ES xxx 01-01 ()] [gc][young][87489][39155] duration [1.6s], 
collections [1]/[1.8s], total [1.6s]/[17h], memory 
[18.4gb]->[17.1gb]/[30gb], all_pools {[young] 
[1.3gb]->[0b]/[0b]}{[survivor] [128mb]->[128mb]/[0b]}{[old] 
[16.9gb]->[16.9gb]/[30gb]}
[ES xxx 01-01 ()] [gc][old][87496][3] duration [43.2s], collections 
[1]/[44.2s], total [43.2s]/[1.3m], memory [17.2gb]->[14.9gb]/[30gb], 
all_pools {[young] [136mb]->[0b]/[0b]}{[survivor] [128mb]->[0b]/[0b]}{[old] 
[16.9gb]->[14.9gb]/[30gb]}
[ES xxx 01-01 ()] [gc][young][87576][39156] duration [1.6s], 
collections [1]/[2.7s], total [1.6s]/[17h], memory [16.3gb]->[15gb]/[30gb], 
all_pools {[young] [1.3gb]->[0b]/[0b]}{[survivor] [0b]->[80mb]/[0b]}{[old] 
[14.9gb]->[14.9gb]/[30gb]}
[ES xxx 01-01 ()] [gc][young][87680][39157] duration [1.5s], 
collections [1]/[2.1s], total [1.5s]/[17h], memory 
[16.3gb]->[14.9gb]/[30gb], all_pools {[young] 
[1.3gb]->[0b]/[0b]}{[survivor] [80mb]->[32mb]/[0b]}{[old] 
[14.9gb]->[14.9gb]/[30gb]}
[ES xxx 01-01 ()] [gc][young][87770][39158] duration [1.6s], 
collections [1]/[1.9s], total [1.6s]/[17h], memory 
[16.4gb]->[14.9gb]/[30gb], all_pools {[young] 
[1.4gb]->[0b]/[0b]}{[survivor] [32mb]->[24mb]/[0b]}{[old] 
[14.9gb]->[14.9gb]/[30gb]}
[ES xxx 01-01 ()] [gc][young][87861][39159] duration [1.7s], 
collections [1]/[2.7s], total [1.7s]/[17h], memory 
[16.4gb]->[14.9gb]/[30gb], all_pools {[young] 
[1.4gb]->[0b]/[0b]}{[survivor] [24mb]->[24mb]/[0b]}{[old] 
[14.9gb]->[14.9gb]/[30gb]}
[ES xxx 01-01 ()] [gc][young][87953][39160] duration [1.5s], 
collections [1]/[1.9s], total [1.5s]/[17h], memory 
[16.3gb]->[14.9gb]/[30gb], all_pools {[young] 
[1.3gb]->[0b]/[0b]}{[survivor] [24mb]->[24mb]/[0b]}{[old] 
[14.9gb]->[14.9gb]/[30gb]}
[ES xxx 01-01 ()] [gc][young][88043][39161] duration [1.6s], 
collections [1]/[1.9s], total [1.6s]/[17h], memory 
[16.4gb]->[14.9gb]/[30gb], all_pools {[young] 
[1.4gb]->[0b]/[0b]}{[survivor] [24mb]->[32mb]/[0b]}{[old] 
[14.9gb]->[14.9gb]/[30gb]}
[ES xxx 01-01 ()] [gc][old][88079][4] duration [37.9s], collections 
[1]/[38.1s], total [37.9s]/[2m], memory [15.5gb]->[14.9gb]/[30gb], 
all_pools {[young] [544mb]->[8mb]/[0b]}{[survivor] [32mb]->[0b]/[0b]}{[old] 
[14.9gb]->[14.9gb]/[30gb]}

As you can see I forced it twice and not much got collected...

Is it the memory mapped files that are taking up the space? Right now the 
cluster is idle.


-- 
Please update your bookmarks! We have moved to https://discuss.elastic.co/
--- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2ef10ae2-a95c-4c63-a9b8-6bc050d48508%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: What does it mean when refresh rate is high?

2015-03-30 Thread John Smith
Thanks the description is just very generic "High values indicate slow I/O"

Wouldn't this mean that other I/O related tasks would also show as red.

Also given the fact that I'm running San Disk 960GB Extreme Pro in RAID 0

Performance**Seq. Read (up to)550 MB/s550 MB/s550 MB/s Seq. Write (up to)520 
MB/s515 MB/s515 MB/s Rnd. Read (up to)100K IOPS100K IOPS100K IOPS Rnd. 
Write (up to)90K IOPS90K IOPS90K IOPS

I also gave my indexing stats above. So why would refresh be red and not 
the rest if IO is the problem.

I would figure the disk configuration would be more then adequate?



On Saturday, 28 March 2015 01:40:16 UTC-4, Mark Walkom wrote:
>
> If you mouse over the field you can see what the value means.
>
> On 27 March 2015 at 05:25, John Smith > 
> wrote:
>
>> Using 1.4.3
>>
>> So I have a nice "beefy" cluster 4 nodes of 32 cores, 128GB RAM, 5TB RAID 
>> 0 (Using Sandisk 960GB Extreme Pro) for each machine.
>>
>> I am indexing about 4000 documents per second at an average of about 800 
>> bytes per document. At the same time as indexing I'm running queries.
>>
>> Looking at Elastic HQ numbers.
>>
>> Indexing - Index:0.28ms0.32ms0.3ms0.32msIndexing - Delete:0ms0ms0ms0msSearch 
>> - Query:29.23ms29.36ms24.46ms36.63msSearch - Fetch:0.25ms0.24ms0.25ms
>> 0.21msGet - Total:0.67ms0.46ms0ms0.48msGet - Exists:1.19ms0.65ms0ms0.48msGet 
>> - Missing:0ms0.03ms0ms0msRefresh:25.32ms24.86ms24.5ms24.81msFlush:
>> 104.14ms90.45ms111.14ms84.63msNo matter what test I'v done or machine 
>> configuration the refresh rate has always been red... What does it mean and 
>> does it matter?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/2da09094-f573-42e9-963f-7f9afa880de7%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/2da09094-f573-42e9-963f-7f9afa880de7%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/79874405-d44f-43c8-ba1b-c07d61838b59%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


What does it mean when refresh rate is high?

2015-03-26 Thread John Smith
Using 1.4.3

So I have a nice "beefy" cluster 4 nodes of 32 cores, 128GB RAM, 5TB RAID 0 
(Using Sandisk 960GB Extreme Pro) for each machine.

I am indexing about 4000 documents per second at an average of about 800 
bytes per document. At the same time as indexing I'm running queries.

Looking at Elastic HQ numbers.

Indexing - Index:0.28ms0.32ms0.3ms0.32msIndexing - Delete:0ms0ms0ms0msSearch 
- Query:29.23ms29.36ms24.46ms36.63msSearch - Fetch:0.25ms0.24ms0.25ms0.21msGet 
- Total:0.67ms0.46ms0ms0.48msGet - Exists:1.19ms0.65ms0ms0.48msGet - 
Missing:0ms0.03ms0ms0msRefresh:25.32ms24.86ms24.5ms24.81msFlush:104.14ms
90.45ms111.14ms84.63msNo matter what test I'v done or machine configuration 
the refresh rate has always been red... What does it mean and does it 
matter?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2da09094-f573-42e9-963f-7f9afa880de7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


What does it meen when refresh rate is high?

2015-03-26 Thread John Smith
Using 1.4.3

So I have a nice "beefy" cluster 4 nodes of 32 cores, 128GB RAM, 5TB RAID 0 
(Using Sandisk 960GB Extreme Pro) for each machine.

I am indexing about 4000 documents per second at an average of about 800 
bytes per document. At the same time as indexing I'm running queries.

Looking at Elastic HQ numbers.

Indexing - Index:0.28ms0.32ms0.3ms0.32msIndexing - Delete:0ms0ms0ms0msSearch 
- Query:29.23ms29.36ms24.46ms36.63msSearch - Fetch:0.25ms0.24ms0.25ms0.21msGet 
- Total:0.67ms0.46ms0ms0.48msGet - Exists:1.19ms0.65ms0ms0.48msGet - 
Missing:0ms0.03ms0ms0msRefresh:25.32ms24.86ms24.5ms24.81msFlush:104.14ms
90.45ms111.14ms84.63msNo matter what test I'v done or machine configuration 
the refresh rate has always been red... What does it mean and does it 
matter?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a9f2cfc2-80c8-4b5e-83b2-09a6da03eccc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: mapreduce with filter script?

2015-02-25 Thread John Smith
So do an agreggation on term = brand?

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations.html


On Wednesday, 25 February 2015 09:50:19 UTC-5, bryan rasmussen wrote:
>
> Hi, 
>
> I would like to get a script to work like mapreduce over the results of my 
> query, so that if I have a query that returns 4 documents
> {
>  brands: "brand1"
> },
> {
>  brands: "brand2"
> },
> {
>  brands: "brand2"
> },
> {
>  brands: "brand3"
> }
>
> and what I want to come out is the documents
>
> {
>  brands: "brand1"
> },
> {
>  brands: "brand2"
> },
> {
>  brands: "brand3"
> }
>
> of course this is a very simplified example, and the query I am doing is 
> not on the brands field but it is the brands field I want to reduce / get 
> rid of duplicates. 
>
> thanks,
> Bryan Rasmussen
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d8962c18-124d-4e4c-a01e-8bca6911ec8a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Some interesting storage numbers for people interested.

2015-02-23 Thread John Smith
Mark when you say you cannot re-index the document you mean re-index within 
the cluster? But if we resubmit the document using the index API it will 
get re-indexed and updated version 2 right?

So Elastic search will mark the document to be deleted from the segment and 
eventually merge the "updated" data?

On Monday, 23 February 2015 17:19:49 UTC-5, John Smith wrote:
>
> Yeah sorry should have mentioned the tradeoffs.
>
> I think there some interesting use-cases here for instance if you are 
> building a pure analytics dashboard where it's 100% aggregations then you 
> can save allot of space with _source: false, _all: false
>
> In my case I'm opting for _source: true, _all: false. Since I need to 
> re-index a document but don't care about the all search. My users are 
> required to specify the field they want to search by specifying the field 
> through a drop down... So it's good for the 25% saving
>
> On Monday, 23 February 2015 17:04:35 UTC-5, Jack Park wrote:
>>
>> Thank you very much, Mark.
>>
>> On Mon, Feb 23, 2015 at 12:54 PM, Mark Walkom  wrote:
>>
>>> Thanks John, this is a really interesting test.
>>>
>>>
>>> If you have no _source you cannot reindex or view the actual raw content 
>>> that was sent to ES, only the analysed portions you keep.
>>> No _all means you have to know the exact field you want to search on or 
>>> else you may get no results, as ES will search _all by default (think of it 
>>> as a shortcut search field).
>>>
>>>
>>> As an aside, we are working on adding a new compression algorithm for ES 
>>> which will also improve storage capacity.
>>>
>>> On 24 February 2015 at 07:27, Jack Park  wrote:
>>>
>>>> What is lost (the tradeoff) when _source is disabled?
>>>> What is lost when _all is disabled?
>>>>
>>>> This is interesting!
>>>>
>>>> Thanks
>>>> Jack
>>>>
>>>>
>>>> On Mon, Feb 23, 2015 at 12:10 PM, John Smith  
>>>> wrote:
>>>>
>>>>> I don't run a blog but I thought I would share some results with the 
>>>>> community.
>>>>>
>>>>> Using Elasticsearch 1.4.3
>>>>>
>>>>> I wanted to test the various ways we could save some storage on our ES 
>>>>> index and here are some numbers
>>>>>
>>>>> Created 6 different indexes with the various mapping settings.
>>>>> Each index containing 4 types.
>>>>> Insert 100,000 documents per type so total 400,000 per index.
>>>>> Average document size 300-400 bytes.
>>>>>
>>>>> The values represent the total primary space taken by each index based 
>>>>> on the different mapping settings.
>>>>>
>>>>> _source: true = 45MB 
>>>>> _source: true, _all: false = 34MB
>>>>> _source: false = 30MB
>>>>> _source: false, _all: false = 18MB
>>>>> _source: false, store: true (all fields) = 39.5MB
>>>>> _source: false, store: true (all fields), _all: false = 28.5MB
>>>>>
>>>>> As you can see the default _source setting takes the most space, while 
>>>>> disabling the _source and _all field saves the most space.
>>>>>
>>>>>
>>>>>
>>>>>  -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "elasticsearch" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to elasticsearc...@googlegroups.com.
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/elasticsearch/423ea99b-b9f2-4551-bb0c-d0167ed52150%40googlegroups.com
>>>>>  
>>>>> <https://groups.google.com/d/msgid/elasticsearch/423ea99b-b9f2-4551-bb0c-d0167ed52150%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to elasticsearc...@googlegroups.com.
>>>> To view this discussion on the web visit 
>>>>

Re: Some interesting storage numbers for people interested.

2015-02-23 Thread John Smith
Yeah sorry should have mentioned the tradeoffs.

I think there some interesting use-cases here for instance if you are 
building a pure analytics dashboard where it's 100% aggregations then you 
can save allot of space with _source: false, _all: false

In my case I'm opting for _source: true, _all: false. Since I need to 
re-index a document but don't care about the all search. My users are 
required to specify the field they want to search by specifying the field 
through a drop down... So it's good for the 25% saving

On Monday, 23 February 2015 17:04:35 UTC-5, Jack Park wrote:
>
> Thank you very much, Mark.
>
> On Mon, Feb 23, 2015 at 12:54 PM, Mark Walkom  > wrote:
>
>> Thanks John, this is a really interesting test.
>>
>>
>> If you have no _source you cannot reindex or view the actual raw content 
>> that was sent to ES, only the analysed portions you keep.
>> No _all means you have to know the exact field you want to search on or 
>> else you may get no results, as ES will search _all by default (think of it 
>> as a shortcut search field).
>>
>>
>> As an aside, we are working on adding a new compression algorithm for ES 
>> which will also improve storage capacity.
>>
>> On 24 February 2015 at 07:27, Jack Park > > wrote:
>>
>>> What is lost (the tradeoff) when _source is disabled?
>>> What is lost when _all is disabled?
>>>
>>> This is interesting!
>>>
>>> Thanks
>>> Jack
>>>
>>>
>>> On Mon, Feb 23, 2015 at 12:10 PM, John Smith >> > wrote:
>>>
>>>> I don't run a blog but I thought I would share some results with the 
>>>> community.
>>>>
>>>> Using Elasticsearch 1.4.3
>>>>
>>>> I wanted to test the various ways we could save some storage on our ES 
>>>> index and here are some numbers
>>>>
>>>> Created 6 different indexes with the various mapping settings.
>>>> Each index containing 4 types.
>>>> Insert 100,000 documents per type so total 400,000 per index.
>>>> Average document size 300-400 bytes.
>>>>
>>>> The values represent the total primary space taken by each index based 
>>>> on the different mapping settings.
>>>>
>>>> _source: true = 45MB 
>>>> _source: true, _all: false = 34MB
>>>> _source: false = 30MB
>>>> _source: false, _all: false = 18MB
>>>> _source: false, store: true (all fields) = 39.5MB
>>>> _source: false, store: true (all fields), _all: false = 28.5MB
>>>>
>>>> As you can see the default _source setting takes the most space, while 
>>>> disabling the _source and _all field saves the most space.
>>>>
>>>>
>>>>
>>>>  -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to elasticsearc...@googlegroups.com .
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/elasticsearch/423ea99b-b9f2-4551-bb0c-d0167ed52150%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/elasticsearch/423ea99b-b9f2-4551-bb0c-d0167ed52150%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com .
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/CAH6s0fwgm%3DTJqN7Vqu7v3yUxg7OKz20rrpRgSx6HpApBRzWgpw%40mail.gmail.com
>>>  
>>> <https://groups.google.com/d/msgid/elasticsearch/CAH6s0fwgm%3DTJqN7Vqu7v3yUxg7OKz20rrpRgSx6HpApBRzWgpw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-JaJOZ9bB7j62K1Y74QNGPZViYpNtpJWuU3nh-myUFfw%40mail.gmail.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-JaJOZ9bB7j62K1Y74QNGPZViYpNtpJWuU3nh-myUFfw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/36fb0b2f-3b80-4279-8f96-efba421cab51%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Some interesting storage numbers for people interested.

2015-02-23 Thread John Smith
I don't run a blog but I thought I would share some results with the 
community.

Using Elasticsearch 1.4.3

I wanted to test the various ways we could save some storage on our ES 
index and here are some numbers

Created 6 different indexes with the various mapping settings.
Each index containing 4 types.
Insert 100,000 documents per type so total 400,000 per index.
Average document size 300-400 bytes.

The values represent the total primary space taken by each index based on 
the different mapping settings.

_source: true = 45MB 
_source: true, _all: false = 34MB
_source: false = 30MB
_source: false, _all: false = 18MB
_source: false, store: true (all fields) = 39.5MB
_source: false, store: true (all fields), _all: false = 28.5MB

As you can see the default _source setting takes the most space, while 
disabling the _source and _all field saves the most space.



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/423ea99b-b9f2-4551-bb0c-d0167ed52150%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Where does client connect when using dedicated master nodes?

2015-02-16 Thread John Smith
Using ES 1.4.3 and Java 1.8_31

4 servers of 32core, 128GB RAM, 4TB data

So 128 cores, 512GB RAM, 16TB data total


Two questions...


Question 1- Because I have such "wonderful" machines I have decided to run 
dedicated master nodes on the same physical machines as the data nodes 
running in same OS instance but separate JVM. Is it even worth it or should 
I just us default config until I can get real physical machines for masters?

So I have...

server1:9000 (http of master node)
server1:9100 (tcp of master node)
server1:9200 (http of data node)
server1:9300 (tcp of data node)

server2:9000 (http of master node)
server2:9100 (tcp of master node)
server2:9200 (http of data node)
server2:9300 (tcp of data node)

server3:9000 (http of master node)
server3:9100 (tcp of master node)
server3:9200 (http of data node)
server3:9300 (tcp of data node)

server4:9200 (http of data node)
server4:9300 (tcp of data node)

Master nodes configured to 4GB of RAM
Data nodes to 30GB of RAM

Question 2: When running dedicated master nodes, do we configure the 
clients to connect to the master nodes or to the actual data nodes?


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/408c90fd-28b1-444e-a559-dc191f65a092%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Grouping by keys that include nil values

2015-02-06 Thread John Smith
You would need to use a placeholder value. So if you know the value is NULL 
or empty then insert the document with a string value of NIL or N/A or 
something...

On Wednesday, 4 February 2015 20:41:18 UTC-5, Mehul Kar wrote:
>
> Ideally, I'd like to see these records grouped under a nil key or 
> something automatically. 
>
>
> --
> Mehul Kar
> @mehulkar
>
> On Wed, Feb 4, 2015 at 5:31 PM, Mehul Kar 
> > wrote:
>
>> Hello, 
>>
>> I'm looking to query an index for groups by a specific key. There are 
>> documents that have nil values in in these queries, and by default ES 
>> doesn't return these records in the aggregations key of the response. 
>>
>> I've tried applying a `missing` filter, and it looks like the `hits` key 
>> in the response includes all the records, but the aggregations key does 
>> not. 
>>
>> What is a good strategy for this? Will I have to post process these 
>> results? Index blank strings for nil values? 
>>
>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/elasticsearch/wfMqQZGW2Ng/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/907ce471-59a7-4832-bd42-e1bf3fc28833%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c63ce3c1-3c70-4716-90eb-78b973ff5290%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Possible? Wildcard template for a collection of fields to solve some dynamic mapping woes

2015-02-06 Thread John Smith
A template wont help you here. I mean it's good to use and you should use 
them. But once the schema is defined you can't change it. This is no 
different then any database.

Your best bet here is to do a bit of data cleansing/normalizing.

If you know that the field is date field and sometimes the date is 
different then you have to try to convert it proper date format before 
inserting. Especially if you are trying to push it all into one field.

Even if you use wildcards in templates like suggested above, you would have 
to know that the date is different to have it pushed to another field.

On Friday, 6 February 2015 06:41:49 UTC-5, Itamar Syn-Hershko wrote:
>
> You mean something like dynamic templates? 
> http://code972.com/blog/2015/02/81-elasticsearch-one-tip-a-day-using-dynamic-templates-to-avoid-rigorous-mappings
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko 
> Freelance Developer & Consultant
> Lucene.NET committer and PMC member
>
> On Fri, Feb 6, 2015 at 1:39 PM, Paul Kavanagh  > wrote:
>
>> Hi all,
>> We're having a MapperParsingException problem with some field values when 
>> we get when we use the JSON Filter for Logstash to explode out a JSON 
>> document to Elasticsearch fields.
>>
>> In 99.9% of cases, certain of these fields are either blank, or contain 
>> dates in the format of -mm-dd. This allows ES to dynamically map this 
>> field to type dateOptionalTime.
>>
>> However, we occasionally see non-standard date formats in these fields, 
>> which our main service can handle fine, but which throws a 
>> MapperParsingException in Elasticsearch - such are here:
>>
>>
>>
>> [2015-02-06 10:46:50,679][WARN ][cluster.action.shard ] [logging-
>> production-elasticsearch-ip-xxx-xxx-xxx-148] [logstash-2015.02.06][2] 
>> received shard failed for [logstash-2015.02.06][2], node[
>> GZpltBjAQUqGyp2B1SLz_g], [R], s[INITIALIZING], indexUUID [BEdTwj-
>> QRuOZB713YAQwvA], reason [Failed to start shard, message [
>> RecoveryFailedException[[logstash-2015.02.06][2]: Recovery failed from [
>> logging-production-elasticsearch-ip-xxx-xxx-xxx-82][IALW-92RReiLffQjSL3I-
>> g][logging-production-elasticsearch-ip-xxx-xxx-xxx-82][inet[ip-xxx-xxx-
>> xxx-82.ec2.internal/xxx.xxx.xxx.82:9300]]{max_local_storage_nodes=1, 
>> aws_availability_zone=us-east-1e, aws_az=us-east-1e} into [logging-
>> production-elasticsearch-ip-xxx-xxx-xxx-148][GZpltBjAQUqGyp2B1SLz_g][
>> logging-production-elasticsearch-ip-xxx-xxx-xxx-148][inet[ip-xxx.xxx.xxx.
>> 148.ec2.internal/xxx.xxx.xxx.148:9300]]{max_local_storage_nodes=1, 
>> aws_availability_zone=us-east-1c, aws_az=us-east-1c}]; nested: 
>> RemoteTransportException[[logging-production-elasticsearch-ip-xxx-xxx-xxx
>> -82][inet[/xxx.xxx.xxx.82:9300]][internal:index/shard/recovery/
>> start_recovery]]; nested: RecoveryEngineException[[logstash-2015.02.06][2
>> ] Phase[2] Execution failed]; nested: RemoteTransportException[[logging-
>> production-elasticsearch-ip-xxx-xxx-xxx-148][inet[/xxx.xxx.xxx.148:9300
>> ]][internal:index/shard/recovery/translog_ops]]; nested: 
>> MapperParsingException[failed to parse [apiservice.logstash.@fields.
>> parameters.start_time]]; nested: MapperParsingException[failed to parse 
>> date field [Feb 5 2015 12:00 AM], tried both date format [
>> dateOptionalTime], and timestamp number with locale []]; nested: 
>> IllegalArgumentException[Invalid format: "Feb 5 2015 12:00 AM"]; ]]
>>
>> 2015-02-06 10:46:53,685][WARN ][cluster.action.shard ] [logging-
>> production-elasticsearch-ip-xxx-xxx-xxx-148] [logstash-2015.02.06][2] 
>> received shard failed for [logstash-2015.02.06][2], node[
>> GZpltBjAQUqGyp2B1SLz_g], [R], s[INITIALIZING], indexUUID [BEdTwj-
>> QRuOZB713YAQwvA], reason [master [logging-production-elasticsearch-ip-xxx
>> -xxx-xxx-148][GZpltBjAQUqGyp2B1SLz_g][logging-production-elasticsearch-ip
>> -xxx-xxx-xxx-148][inet[ip-xxx-xxx-xxx-148.ec2.internal/xxx.xxx.xxx.148:
>> 9300]]{max_local_storage_nodes=1, aws_availability_zone=us-east-1c, 
>> aws_az=us-east-1c} marked shard as initializing, but shard is marked as 
>> failed, resend shard failure]
>>
>>
>> Our planned solution was to create a template for Logstash indices that 
>> will set these fields to string. But as the field above isn't the only 
>> culprit, and more may be added overtime, it makes more sense to create a 
>> template to map all fields under apiservice.logstash.@fields.parameters.* 
>> to be string. (We never need to query on user entered data, but it's great 
>> to have logged for debugging)
>>
>> Is it possible to do this with a template? I could not find a way to do 
>> this via the template documentation on the ES site. 
>>
>> Any guidance would be great!
>>
>> Thanks,
>> -Paul
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>

Re: Sizing master nodes

2015-02-04 Thread John Smith
Yeah I have four boxes now. So I can at least put master nodes on same physical 
boxes in separate jam instances at least.

So 4 data nodes and 3 master nodes.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dd37815f-308e-47bd-a252-513caa11a103%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Sizing master nodes

2015-02-03 Thread John Smith
And right i don't envision more than 10-12 node cluster

On Tuesday, 3 February 2015 16:45:01 UTC-5, John Smith wrote:
>
> Eh well I have 128GB
>
> 32GB for ES data
> 32GB for the OS for file caching?
>
> And then there another 64GB to play with I supose 4GB for master node is 
> ok.
>
> But is it ok to deploy the master node on same box as data node on 
> separate JVM process since I have the horse power?
>
> On Tuesday, 3 February 2015 10:48:10 UTC-5, David Pilato wrote:
>>
>> For a master only nodes on a small cluster (I mean small number of 
>> nodes/indices/mappings), you can probably set HEAP to 1gb. (512 Mb should 
>> work as well probably)
>>
>> My 2 cents
>>
>> -- 
>> *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
>> <http://Elasticsearch.com>*
>> @dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr 
>> <https://twitter.com/elasticsearchfr> | @scrutmydocs 
>> <https://twitter.com/scrutmydocs>
>>
>>
>>  
>> Le 3 févr. 2015 à 16:09, John Smith  a écrit :
>>
>> Hi I read bunch of things on master nodes etc...
>>
>> So I have some decent machines 4x 32core 128GB RAM and 6x960GB SSDS (RAID 
>> 5 so 4.3 TB)
>>
>> So all 4 machines will be data nodes configured to 32GB.
>>
>> I was also thinking of running the master nodes on same machines on 
>> separate JVM using N/2+1. But how much RAM do master nodes need to be etc?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/a6443665-2999-4e66-938e-8b5ac49dbd59%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/a6443665-2999-4e66-938e-8b5ac49dbd59%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e99beb41-88cd-4bab-a9b1-bb620979ee03%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Sizing master nodes

2015-02-03 Thread John Smith
Eh well I have 128GB

32GB for ES data
32GB for the OS for file caching?

And then there another 64GB to play with I supose 4GB for master node is ok.

But is it ok to deploy the master node on same box as data node on separate 
JVM process since I have the horse power?

On Tuesday, 3 February 2015 10:48:10 UTC-5, David Pilato wrote:
>
> For a master only nodes on a small cluster (I mean small number of 
> nodes/indices/mappings), you can probably set HEAP to 1gb. (512 Mb should 
> work as well probably)
>
> My 2 cents
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com 
> <http://Elasticsearch.com>*
> @dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr 
> <https://twitter.com/elasticsearchfr> | @scrutmydocs 
> <https://twitter.com/scrutmydocs>
>
>
>  
> Le 3 févr. 2015 à 16:09, John Smith > a 
> écrit :
>
> Hi I read bunch of things on master nodes etc...
>
> So I have some decent machines 4x 32core 128GB RAM and 6x960GB SSDS (RAID 
> 5 so 4.3 TB)
>
> So all 4 machines will be data nodes configured to 32GB.
>
> I was also thinking of running the master nodes on same machines on 
> separate JVM using N/2+1. But how much RAM do master nodes need to be etc?
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/a6443665-2999-4e66-938e-8b5ac49dbd59%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/elasticsearch/a6443665-2999-4e66-938e-8b5ac49dbd59%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8df1f13d-eab4-4694-9ddc-87bb0c393c3a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Sizing master nodes

2015-02-03 Thread John Smith
Hi I read bunch of things on master nodes etc...

So I have some decent machines 4x 32core 128GB RAM and 6x960GB SSDS (RAID 5 
so 4.3 TB)

So all 4 machines will be data nodes configured to 32GB.

I was also thinking of running the master nodes on same machines on 
separate JVM using N/2+1. But how much RAM do master nodes need to be etc?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a6443665-2999-4e66-938e-8b5ac49dbd59%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Resume scroll-scan query?

2014-10-23 Thread John Smith
Small ttl is ok (well adjusted properly for you process) because everytime 
you call scroll it resets the ttl. So you don't need to put a 60m scroll 
time. It just has to be long enough to be able to process the next scroll 
id.

I'm curious if you can re-use the scroll id. It's not specifically 
mentioned in the docs but i think scroll is forward only. So not sure once 
you got once scroll id you can go back to it. I guess one way to find out :)

On Thursday, 23 October 2014 15:44:04 UTC-4, Roger de Cordova Farias wrote:
>
> Hmm, I was using a small ttl, just enough to process each scroll call, but 
> I could try using a longer time to live and resuming from the last 
> scroll_id in case of error
>
> That is a good idea, thanks
>
> 2014-10-23 17:12 GMT-02:00 John Smith >:
>
>> The scroll is available based on a timeout value you give it. 
>> Everytimetime you scroll you restart the countdown.
>>
>> You could track the last scroll id you used and try it again from there?
>>
>> On Thursday, 23 October 2014 12:47:02 UTC-4, Roger de Cordova Farias 
>> wrote:
>>>
>>> I'm reindexing a ElasticSearch base with 50m docs using the scroll-scan 
>>> request to retrieve all docs, but my "reindexer" program stopped at 30m
>>>
>>> Is there a way to redo the query to retrieve the left docs? Like using 
>>> offset?
>>>
>>> Would the the internal order of the scan query be the same with a second 
>>> request?
>>>
>>> I can assure that no new docs were indexed in the old index since the 
>>> beginning of the reindexing
>>>
>>  -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/elasticsearch/NbshHCrBHoM/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/45b51276-daa4-4f39-b46c-017296db689a%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/45b51276-daa4-4f39-b46c-017296db689a%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ec345d9e-19b4-4d2c-985a-fbf245e31a19%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Resume scroll-scan query?

2014-10-23 Thread John Smith
The scroll is available based on a timeout value you give it. Everytimetime 
you scroll you restart the countdown.

You could track the last scroll id you used and try it again from there?

On Thursday, 23 October 2014 12:47:02 UTC-4, Roger de Cordova Farias wrote:
>
> I'm reindexing a ElasticSearch base with 50m docs using the scroll-scan 
> request to retrieve all docs, but my "reindexer" program stopped at 30m
>
> Is there a way to redo the query to retrieve the left docs? Like using 
> offset?
>
> Would the the internal order of the scan query be the same with a second 
> request?
>
> I can assure that no new docs were indexed in the old index since the 
> beginning of the reindexing
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/45b51276-daa4-4f39-b46c-017296db689a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Aggregations with missing values

2014-10-23 Thread John Smith
Using 1.3.4

Also found this...


https://github.com/elasticsearch/elasticsearch/issues/5324


Would a work around be by putting some placeholder like "N/A" instead of 
having a missing field work?

Some example docs:where we have phone but not fax

{
   "phone":""
}


Add N/A for fields that do not have values.
{
   "phone":""
   "fax":"N/A"
}



Apart requiring extra disk space. I don't seer this slowing down queries or 
would it?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/baf21bd5-f73c-48a2-ad3c-0e88206e6931%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Which RAID config for ES?

2014-09-25 Thread John Smith
So assuming 2 node cluster. If you lose the array on one machine, you still 
have the other.

I guess it comes down to what are the chances that multiple RAID0 arrays 
fail at the same time on the cluster

 

On Thursday, 25 September 2014 13:47:13 UTC-4, John Smith wrote:
>
> Not so much concerned about the performance boost of RAID 0 but rather how 
> fault tolerant is ES on RAID 0.
>
> On Thursday, 25 September 2014 10:12:10 UTC-4, Jörg Prante wrote:
>>
>> If you want speed, use RAID 0.
>>
>> I have 4 cheap SSDs on one SAS-2 controller in RAID 0.
>>
>> Example for SSD
>>
>>
>> http://core0.staticworld.net/images/article/2014/06/intel_ssd_raid-100315228-orig.png
>>
>> Jörg
>>
>>
>> On Thu, Sep 25, 2014 at 4:01 PM, Nikolas Everett  
>> wrote:
>>
>>>
>>>
>>> On Thu, Sep 25, 2014 at 9:58 AM, John Smith  wrote:
>>>
>>> So given the built in fault tolerance of Elasticsearch across the 
>>>> cluster are people adventurous enough to use RAID0?
>>>>
>>>
>>> Absolutely.  We only do it with pairs of disks though because RAID0 on 
>>> any more then two disks just feels squicky.
>>>
>>> Nik 
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0pb5PiOE9Je8nG3mB5qq%2BEvk_Tnh4k06FXUd%2BestPmmg%40mail.gmail.com
>>>  
>>> <https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0pb5PiOE9Je8nG3mB5qq%2BEvk_Tnh4k06FXUd%2BestPmmg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/986ed9fb-82da-41d9-b0fb-6a999dde1c29%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Which RAID config for ES?

2014-09-25 Thread John Smith
Not so much concerned about the performance boost of RAID 0 but rather how 
fault tolerant is ES on RAID 0.

On Thursday, 25 September 2014 10:12:10 UTC-4, Jörg Prante wrote:
>
> If you want speed, use RAID 0.
>
> I have 4 cheap SSDs on one SAS-2 controller in RAID 0.
>
> Example for SSD
>
>
> http://core0.staticworld.net/images/article/2014/06/intel_ssd_raid-100315228-orig.png
>
> Jörg
>
>
> On Thu, Sep 25, 2014 at 4:01 PM, Nikolas Everett  > wrote:
>
>>
>>
>> On Thu, Sep 25, 2014 at 9:58 AM, John Smith > > wrote:
>>
>> So given the built in fault tolerance of Elasticsearch across the cluster 
>>> are people adventurous enough to use RAID0?
>>>
>>
>> Absolutely.  We only do it with pairs of disks though because RAID0 on 
>> any more then two disks just feels squicky.
>>
>> Nik 
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0pb5PiOE9Je8nG3mB5qq%2BEvk_Tnh4k06FXUd%2BestPmmg%40mail.gmail.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0pb5PiOE9Je8nG3mB5qq%2BEvk_Tnh4k06FXUd%2BestPmmg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9a2f5e2b-865d-443f-91aa-20cc267c07d5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Which RAID config for ES?

2014-09-25 Thread John Smith
Not so much concerened about speed I know that RAID0 will give some 
performance boost.

Just curious how tolerant an ES cluster is on RAID 0.

On Thursday, 25 September 2014 10:12:10 UTC-4, Jörg Prante wrote:
>
> If you want speed, use RAID 0.
>
> I have 4 cheap SSDs on one SAS-2 controller in RAID 0.
>
> Example for SSD
>
>
> http://core0.staticworld.net/images/article/2014/06/intel_ssd_raid-100315228-orig.png
>
> Jörg
>
>
> On Thu, Sep 25, 2014 at 4:01 PM, Nikolas Everett  > wrote:
>
>>
>>
>> On Thu, Sep 25, 2014 at 9:58 AM, John Smith > > wrote:
>>
>> So given the built in fault tolerance of Elasticsearch across the cluster 
>>> are people adventurous enough to use RAID0?
>>>
>>
>> Absolutely.  We only do it with pairs of disks though because RAID0 on 
>> any more then two disks just feels squicky.
>>
>> Nik 
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0pb5PiOE9Je8nG3mB5qq%2BEvk_Tnh4k06FXUd%2BestPmmg%40mail.gmail.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0pb5PiOE9Je8nG3mB5qq%2BEvk_Tnh4k06FXUd%2BestPmmg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/46d0090a-45e0-464a-9d31-2fc1175d30b0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Which RAID config for ES?

2014-09-25 Thread John Smith
So all your machines just have 2 drives each? Isn't that expensive?

On Thursday, 25 September 2014 10:01:42 UTC-4, Nikolas Everett wrote:
>
>
>
> On Thu, Sep 25, 2014 at 9:58 AM, John Smith  > wrote:
>
> So given the built in fault tolerance of Elasticsearch across the cluster 
>> are people adventurous enough to use RAID0?
>>
>
> Absolutely.  We only do it with pairs of disks though because RAID0 on any 
> more then two disks just feels squicky.
>
> Nik 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b3650a09-dd36-455b-b882-6b8d4ca47257%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Which RAID config for ES?

2014-09-25 Thread John Smith
So given the built in fault tolerance of Elasticsearch across the cluster 
are people adventurous enough to use RAID0?

I'm thinking of middle ground like RAID5...

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f57383fd-a25c-4be9-a04a-ed391fddcbcf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Some efficient ways to export data other then JSON from Elasticsearch.

2014-09-16 Thread John Smith
Hadn't looked at Jakson for a while but it seems to do both XML and CSV 
(limited to json that represents tabular data)

On Tuesday, 16 September 2014 10:48:58 UTC-4, John Smith wrote:
>
> Yep, already doing that part actually...
>
> Was just wondering I guess the best way to deserialize from json to xml 
> for instance.
>
> I suppose it's slightly off topic but what are some good json to xml 
> converters.
>
> On Tuesday, 16 September 2014 10:23:05 UTC-4, David Pilato wrote:
>>
>> You need to use the scan and scroll API for that.
>>
>> See 
>> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html#scroll-scan
>>
>> This class could help you in Java: 
>> https://github.com/elasticsearch/elasticsearch/blob/master/src/test/java/org/elasticsearch/search/scroll/SearchScrollTests.java
>>
>>
>> HTH
>>
>> -- 
>> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
>> @dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr 
>> <https://twitter.com/elasticsearchfr>
>>
>>
>> Le 16 septembre 2014 à 16:13:13, John Smith (java.d...@gmail.com) a 
>> écrit:
>>
>> Also it has to be done on the back end so JAVA it is...
>>
>> On Tuesday, 16 September 2014 10:04:44 UTC-4, John Smith wrote: 
>>>
>>> Hi, building some sort of internal tool to export data from 
>>> Elasticsearch and I would liek to offer csv or XML. 
>>>
>>> Just wondering what options there are...
>>>
>>>
>>> Bassically a user can login to a front end (No I cannot use what is out 
>>> there, it's only a small portion of a larger tool within the organization) 
>>> put their query in and then select the export format: JSON, XML, CSV.
>>>
>>> When I have the SearchResponse from onResponse() what options do i have 
>>> here?
>>>
>>> Thanks
>>>  
>>  --
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/93236154-59a4-45b3-a2f7-aa93c928bcba%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/93236154-59a4-45b3-a2f7-aa93c928bcba%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0d7f834d-2ab8-4d5b-b3bb-04377efd5d03%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Some efficient ways to export data other then JSON from Elasticsearch.

2014-09-16 Thread John Smith
Yep, already doing that part actually...

Was just wondering I guess the best way to deserialize from json to xml for 
instance.

I suppose it's slightly off topic but what are some good json to xml 
converters.

On Tuesday, 16 September 2014 10:23:05 UTC-4, David Pilato wrote:
>
> You need to use the scan and scroll API for that.
>
> See 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html#scroll-scan
>
> This class could help you in Java: 
> https://github.com/elasticsearch/elasticsearch/blob/master/src/test/java/org/elasticsearch/search/scroll/SearchScrollTests.java
>
>
> HTH
>
> -- 
> *David Pilato* | *Technical Advocate* | *Elasticsearch.com*
> @dadoonet <https://twitter.com/dadoonet> | @elasticsearchfr 
> <https://twitter.com/elasticsearchfr>
>
>
> Le 16 septembre 2014 à 16:13:13, John Smith (java.d...@gmail.com 
> ) a écrit:
>
> Also it has to be done on the back end so JAVA it is...
>
> On Tuesday, 16 September 2014 10:04:44 UTC-4, John Smith wrote: 
>>
>> Hi, building some sort of internal tool to export data from Elasticsearch 
>> and I would liek to offer csv or XML. 
>>
>> Just wondering what options there are...
>>
>>
>> Bassically a user can login to a front end (No I cannot use what is out 
>> there, it's only a small portion of a larger tool within the organization) 
>> put their query in and then select the export format: JSON, XML, CSV.
>>
>> When I have the SearchResponse from onResponse() what options do i have 
>> here?
>>
>> Thanks
>>  
>  --
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearc...@googlegroups.com .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/93236154-59a4-45b3-a2f7-aa93c928bcba%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/elasticsearch/93236154-59a4-45b3-a2f7-aa93c928bcba%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fe0ccad7-d34a-4c63-9e9a-cbd063896f56%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Some efficient ways to export data other then JSON from Elasticsearch.

2014-09-16 Thread John Smith
Also it has to be done on the back end so JAVA it is...

On Tuesday, 16 September 2014 10:04:44 UTC-4, John Smith wrote:
>
> Hi, building some sort of internal tool to export data from Elasticsearch 
> and I would liek to offer csv or XML.
>
> Just wondering what options there are...
>
>
> Bassically a user can login to a front end (No I cannot use what is out 
> there, it's only a small portion of a larger tool within the organization) 
> put their query in and then select the export format: JSON, XML, CSV.
>
> When I have the SearchResponse from onResponse() what options do i have 
> here?
>
> Thanks
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/93236154-59a4-45b3-a2f7-aa93c928bcba%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Some efficient ways to export data other then JSON from Elasticsearch.

2014-09-16 Thread John Smith
Hi, building some sort of internal tool to export data from Elasticsearch 
and I would liek to offer csv or XML.

Just wondering what options there are...


Bassically a user can login to a front end (No I cannot use what is out 
there, it's only a small portion of a larger tool within the organization) 
put their query in and then select the export format: JSON, XML, CSV.

When I have the SearchResponse from onResponse() what options do i have 
here?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2a4edfc9-ac2a-4466-ad01-07a79239400f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to do case insensitive search on terms?

2014-09-15 Thread John Smith
Thanks match works also.

On Monday, 15 September 2014 12:09:43 UTC-4, Nikolas Everett wrote:
>
> Use a match instead of a query_string - query_string has funky syntax that 
> can activate fuzzy matching and search on other fields and stuff - probably 
> not what you want.  Otherwise it shouldn't have any real impact on 
> performance.
>
> On Mon, Sep 15, 2014 at 12:04 PM, John Smith  > wrote:
>
>> Thanks!
>>
>> Yes I want case insensitive search and yes I'm using default analyzer.
>>
>> Just as you guys answered I tried this...
>>
>> {
>>   "query": {
>> "query_string": {
>>   "fields": ["logType"],
>>   "query": "ABC"
>> }
>>   }
>> }
>>
>> and it worked. Since I'm searching for "exact" matches (not wildards or 
>> anything like that) does it make a difference in performance? So far it 
>> seems like no at least when testing through Sense on the same amount of 
>> data.
>>
>> On Monday, 15 September 2014 11:49:33 UTC-4, Nikolas Everett wrote:
>>>
>>> Or if you want case insensitive search use a match query 
>>> <http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html>
>>> .
>>>
>>> On Mon, Sep 15, 2014 at 11:47 AM, joerg...@gmail.com >> > wrote:
>>>
>>>> I assume you use the standard analyzer which uses by default a token 
>>>> filter "lowercase".
>>>>
>>>> Just use a custom analyzer, without "lowercase" token filter, and you 
>>>> will get case-sensitive search.
>>>>
>>>> Jörg
>>>>
>>>> On Mon, Sep 15, 2014 at 5:44 PM, John Smith  
>>>> wrote:
>>>>
>>>>> Using ES 1.3.2
>>>>>
>>>>> The current application I'm building only uses term queries for exact 
>>>>> matches.
>>>>>
>>>>> Example query
>>>>>
>>>>>   "query": {
>>>>> "term": {
>>>>>   "logType": "abc"
>>>>> }
>>>>>
>>>>>
>>>>> The field logType is pulled from external DB as all caps so for 
>>>>> instance ABC
>>>>>
>>>>> If i send the query
>>>>>
>>>>>   "query": {
>>>>> "term": {
>>>>>   "logType": "ABC"
>>>>> }
>>>>>
>>>>>
>>>>> I get no results.
>>>>>
>>>>> If I send the query
>>>>>
>>>>>   "query": {
>>>>> "term": {
>>>>>   "logType": "abc"
>>>>> }
>>>>>
>>>>> I get results.
>>>>>
>>>>> So does this mean I need to toLower the input before building the 
>>>>> query or there's an ES way of doing this?
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "elasticsearch" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to elasticsearc...@googlegroups.com.
>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>> msgid/elasticsearch/8fec1f56-1a59-4d17-bb01-f2ecd62bfbb4%
>>>>> 40googlegroups.com 
>>>>> <https://groups.google.com/d/msgid/elasticsearch/8fec1f56-1a59-4d17-bb01-f2ecd62bfbb4%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to elasticsearc...@googlegroups.com.
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/CAKdsXoHi-483Tgf5DmwXEfKrGRye1vjvGU4jFL0
>>>> o8guNsGBaCw%40mail.gmail.com 
>>>> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHi-483Tgf5DmwXEfKrGRye1vjvGU4jFL0o8guNsGBaCw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/b72784e7-48c1-4c10-ab69-77df2c1b6d42%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/b72784e7-48c1-4c10-ab69-77df2c1b6d42%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/07bb0325-ee98-4fe9-a2f1-a88a50a555f8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: How to do case insensitive search on terms?

2014-09-15 Thread John Smith
Thanks!

Yes I want case insensitive search and yes I'm using default analyzer.

Just as you guys answered I tried this...

{
  "query": {
"query_string": {
  "fields": ["logType"],
  "query": "ABC"
}
  }
}

and it worked. Since I'm searching for "exact" matches (not wildards or 
anything like that) does it make a difference in performance? So far it 
seems like no at least when testing through Sense on the same amount of 
data.

On Monday, 15 September 2014 11:49:33 UTC-4, Nikolas Everett wrote:
>
> Or if you want case insensitive search use a match query 
> <http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-match-query.html>
> .
>
> On Mon, Sep 15, 2014 at 11:47 AM, joerg...@gmail.com  <
> joerg...@gmail.com > wrote:
>
>> I assume you use the standard analyzer which uses by default a token 
>> filter "lowercase".
>>
>> Just use a custom analyzer, without "lowercase" token filter, and you 
>> will get case-sensitive search.
>>
>> Jörg
>>
>> On Mon, Sep 15, 2014 at 5:44 PM, John Smith > > wrote:
>>
>>> Using ES 1.3.2
>>>
>>> The current application I'm building only uses term queries for exact 
>>> matches.
>>>
>>> Example query
>>>
>>>   "query": {
>>> "term": {
>>>   "logType": "abc"
>>> }
>>>
>>>
>>> The field logType is pulled from external DB as all caps so for instance 
>>> ABC
>>>
>>> If i send the query
>>>
>>>   "query": {
>>> "term": {
>>>   "logType": "ABC"
>>> }
>>>
>>>
>>> I get no results.
>>>
>>> If I send the query
>>>
>>>   "query": {
>>> "term": {
>>>   "logType": "abc"
>>> }
>>>
>>> I get results.
>>>
>>> So does this mean I need to toLower the input before building the query 
>>> or there's an ES way of doing this?
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>  -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com .
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/8fec1f56-1a59-4d17-bb01-f2ecd62bfbb4%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/elasticsearch/8fec1f56-1a59-4d17-bb01-f2ecd62bfbb4%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHi-483Tgf5DmwXEfKrGRye1vjvGU4jFL0o8guNsGBaCw%40mail.gmail.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHi-483Tgf5DmwXEfKrGRye1vjvGU4jFL0o8guNsGBaCw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b72784e7-48c1-4c10-ab69-77df2c1b6d42%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


How to do case insensitive search on terms?

2014-09-15 Thread John Smith
Using ES 1.3.2

The current application I'm building only uses term queries for exact 
matches.

Example query

  "query": {
"term": {
  "logType": "abc"
}


The field logType is pulled from external DB as all caps so for instance ABC

If i send the query

  "query": {
"term": {
  "logType": "ABC"
}


I get no results.

If I send the query

  "query": {
"term": {
  "logType": "abc"
}

I get results.

So does this mean I need to toLower the input before building the query or 
there's an ES way of doing this?

Thanks




-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8fec1f56-1a59-4d17-bb01-f2ecd62bfbb4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Does the wares transport support the plugin sites?

2014-09-11 Thread John Smith
I installed wares and it seems to work fine, though it doesn't serve the 
plugin sites.

When I tried to access the _plugin url It get a a "No handler registered 
for _plugin".

I haven't touched servlets in for ages. So I assume it may be doable with a 
filter...

So map ES servlet to /es and use default servlet to load statis site 
content, though this requires to copy the plugins folder in WEB-INF or 
something like that...

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7b28a528-e93d-4b71-9a6f-db20bd8e9c5d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Add field to all documents of an index.

2014-09-02 Thread John Smith
Thanks! That's what I have been thinking! :)


On 2 September 2014 13:15, vineeth mohan  wrote:

> Hello John ,
>
> Update by query plugin would be a good choice -
> https://github.com/yakaz/elasticsearch-action-updatebyquery/
> Just set a match_all query.
>
> Bulk API with updates to all the documents should also work.
>
> Thanks
>   Vineeth
>
>
> On Tue, Sep 2, 2014 at 7:01 PM, John Smith  wrote:
>
>> Hi I need to add a new field (with a default) to all documents already
>> indexed.
>>
>> I guess bulk update will do it?
>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>>
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/3b0a5b54-a355-4ad7-86a7-286808d4de36%40googlegroups.com
>> <https://groups.google.com/d/msgid/elasticsearch/3b0a5b54-a355-4ad7-86a7-286808d4de36%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/zOS6iUkZ3Eo/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DhWg1Jztx%3D6tKXDtzUBMKearj1uEnfvbd9M80VfhOFdw%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CAGdPd5%3DhWg1Jztx%3D6tKXDtzUBMKearj1uEnfvbd9M80VfhOFdw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAMiEuFRRjhvpM%2BSOfzCx%2BfjgwFbv6nL_vTr39CQRkG3XRmH1QQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Add field to all documents of an index.

2014-09-02 Thread John Smith
Hi I need to add a new field (with a default) to all documents already 
indexed.

I guess bulk update will do it?


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3b0a5b54-a355-4ad7-86a7-286808d4de36%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Does transport client do scatter gather?

2014-08-29 Thread John Smith
According to this...

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-node.html

Non data nodes (I assume Node client is equivalent of a non data node) is
capable of scatter/gather searching. Was wondering if transport can do this
also?

2- Does transport support routing if you specify routing field? Or does it
always round robin regardless?
On Aug 29, 2014 12:09 PM, "joergpra...@gmail.com" 
wrote:

> I'm not exactly sure what you mean by scatter-gather, but yes, both
> clients can execute requests on all nodes of the cluster.
>
> Jörg
>
>
> On Fri, Aug 29, 2014 at 3:43 PM, John Smith 
> wrote:
>
>> Just as the subject asks or only the node client can do scatter gather?
>>
>> Thanks
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/b5274032-c142-46df-91e2-f451ab9c069e%40googlegroups.com
>> <https://groups.google.com/d/msgid/elasticsearch/b5274032-c142-46df-91e2-f451ab9c069e%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/70zTmEuyWHE/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHdeW329GZgvOm3NG0NAuNvEUJftkSMzKTyCAzM1%2B8bFg%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHdeW329GZgvOm3NG0NAuNvEUJftkSMzKTyCAzM1%2B8bFg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAMiEuFSuCrwaF6qoVf3-rsA_NjQKrJjFue62kjVvoiUH8A2rJA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Does transport client do scatter gather?

2014-08-29 Thread John Smith
Just as the subject asks or only the node client can do scatter gather?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b5274032-c142-46df-91e2-f451ab9c069e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Elastic HQ not getting back vendor info from Elasticsearch.

2014-08-27 Thread John Smith
I know, but i think it's 2 issues...

1- The vendor info for client nodes is not returned. I assume that HQ is 
using standard REST interface and Elasticsearch not returning the info 
(Maybe its a Windows SIGAR/Permission issue ???). I posted the response 
JSon with the bug.
2- HQ doesn't handle gracefully the fact that the info is not returned. 
(It's throwing JavaScript exception, but nothing shows up on the screen so 
you think it's "frozen", when it's not)

On Tuesday, 26 August 2014 19:06:35 UTC-4, Mark Walkom wrote:
>
> ElasticHQ is a community plugin, the ES devs can't help here.
>
> I have raised issues against ElasticHQ in the past and Roy has fixed them 
> pretty quickly :)
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>
>
> On 27 August 2014 04:44, John Smith > 
> wrote:
>
>> I posted an issue with Elastic HQ here: https://github.com/
>> royrusso/elasticsearch-HQ/issues/164
>>
>> But just in case maybe an Elastic dev can have a look and see if it's 
>> Elasticsearch issue or not.
>>
>> Thanks
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/a60bf5ec-c167-469f-b856-355faeea5601%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/a60bf5ec-c167-469f-b856-355faeea5601%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7df3ead0-d523-4b07-b620-b0028b4dcc87%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Elastic HQ not getting back vendor info from Elasticsearch.

2014-08-26 Thread John Smith
I posted an issue with Elastic HQ here: 
https://github.com/royrusso/elasticsearch-HQ/issues/164

But just in case maybe an Elastic dev can have a look and see if it's 
Elasticsearch issue or not.

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a60bf5ec-c167-469f-b856-355faeea5601%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Elastic HQ not getting back vendor info.

2014-08-26 Thread John Smith
I posted an issue with Elastic HQ 
here: https://github.com/royrusso/elasticsearch-HQ/issues/164

But just in case maybe an Elastic dev can have a look and see if it's 
Elasticsearch issue or not.

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c6161414-ad80-4881-bf87-ede7f1818437%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Native client strictly as client.

2014-08-25 Thread John Smith
Using 1.3.2

Just to be sure...

If using the Native client APIs...

If creating a node client essentially that client becomes a node in the 
cluster and you can also proxy through it (as i see in the logs it's 
actually binds 9300 and 9200)?

If using the transport client then it's strictly a client an no one else 
can connect or it or proxy through it?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e1c92141-ce7d-467b-8f2d-b052f41c1a10%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Simple howto stunnel for elastcisearch cluster.

2014-08-25 Thread John Smith
And yay native API clients are nodes also, which allows them to become 
proxies. So then you need to stunnel protect them also. Rinse and repeat lol

So...

1- For port 9300 bind to localhost
2- Put stunnel infront of port 9300 and configure all nodes same way to 
have cluster node coms in SSL.
3- Restrict any access to 9300. (clients can become proxy nodes, so if they 
are somewhere external to the ES cluster, then you could connect to them 
unauthenticated/non ssl)
3- a) For port 9200 bind to localhost and put Ngnx as reverse proxy (This 
is straight passthrough)
b) Or use 3rd party plugin like jetty plugin (you have to rely that the 
plugin is doing the right thing and has no bugs, plus plugins are not 
necessarily up to speed with latest ES releases)

It's a bit cumbersome but this secures ES to the max. Also this forces the 
use of HTTP client which you then lose some of the niceties you get with 
native client. (Read more here: https://github.com/searchbox-io/Jest)






On Friday, 22 August 2014 13:47:12 UTC-4, John Smith wrote:
>
> Ok so I think I figured it out and seems to be working ok. Please feel 
> free to publish this or improve upon it etc... Note: client certs have not 
> been tested yet.
>
> Software versions used (though I don't think it matters really)
> Ubuntu 14.04
> JDK 1.8_20
> elasticsearch 1.3.2
> stunnel4
>
> This config is for 2 node config.
>
> 
> NODE 1
> 
>
> Required config changes to elasticsearch.yml
>
> # First bind elasticsearch to localhost (this makes es invisible to the 
> outside world)
> network.bind_host: 127.0.0.1
> transport.tcp.port: 9300
>
> # Since we are going to hide this node from the outside, we have to tell 
> the rest of the nodes how he looks on the outside
> network.publish_host: 
> transport.publish_port: 9700
>
> http.port: 9200
>
> # Disable muslticast
> discovery.zen.ping.multicast.enabled: false
>
> # Since we are hiding all the nodes behind stunnel we also need to proxy 
> es client requests through SSL. 
> # For each additional node add 127.0.0.1:970x where x is incremented by 1 
> I.e: 9702, 9703 etc...
> # Connect to NODE 2
> discovery.zen.ping.unicast.hosts: 127.0.0.1:9701
>
> stunnel.conf on NODE 1
>
> ;Proxy ssl for tcp transport.
> [es-trasnport]
> accept = :9300
> connect = 127.0.0.1:9300
> cert = stunnel.pem
>
> ;Proxy ssl for http
> [es-http]
> accept = :9200
> connect = 127.0.0.1:9200
> cert = stunnel.pem
>
> ;ES clustering does some local discovery.
> ;Since stunnel binds it's own ports, we pick an arbitrary port that is not 
> used by other "systems/protocols"
> ; See the publish settings of elasticsearch.yml above.
> [es-transport-local]
> client = yes
> accept = :9700
> connect = :9300
>
> ; The ssl client tunnel for es to connect ssl to node 2.
> [es-transport-node2]
> client = yes
> accept = 127.0.0.1:9701
> connect = :9301
>
> ;For each additional node increment x by 1, I.e: 9702, 9703 etc...
> [es-transport-nodex]
> client = yes
> accept = 127.0.0.1:970x
> connect = :930x
>
> 
> NODE 2
> 
>
> Required config changes to elasticsearch.yml
>
> # First bind elasticsearch to localhost (this makes es invisible to the 
> outside world)
> network.bind_host: 127.0.0.1
> transport.tcp.port: 9301
>
> # Since we are going to hide this node from the outside, we have to tell 
> the rest of the nodes how he looks on the outside
> network.publish_host: 
> transport.publish_port: 9701
>
> http.port: 9200
>
> # Disable muslticast
> discovery.zen.ping.multicast.enabled: false
>
> # Since we are hiding all the nodes behind stunnel we also need to proxy 
> es client requests through SSL. 
> # For each additional node add 127.0.0.1:970x where x is incremented by 1 
> I.e: 9702, 9703 etc...
> # Connect to NODE 1
> discovery.zen.ping.unicast.hosts: 127.0.0.1:9700
>
> stunnel.conf on NODE 2
>
> ;Proxy ssl for tcp transport.
> [es-trasnport]
> accept = :9301
> connect = 127.0.0.1:9301
> cert = stunnel.pem
>
> ;Proxy ssl for http
> [es-http]
> accept = :9200
> connect = 127.0.0.1:9200
> cert = stunnel.pem
>
> ;ES clustering does some local discovery.
> ;Since stunnel binds it's own ports, we pick an arbitrary port that is not 
> used by other "systems/protocols"
> ; See the publish settings of elasticsearch.yml above.
> [es-transport-local]
> client = yes
> accept = :9701
> connect = :9301
>
>
> ; The ssl client tunnel for es to connect ssl to node 1.
> [es-transport

Simple howto stunnel for elastcisearch cluster.

2014-08-22 Thread John Smith
Ok so I think I figured it out and seems to be working ok. Please feel free 
to publish this or improve upon it etc... Note: client certs have not been 
tested yet.

Software versions used (though I don't think it matters really)
Ubuntu 14.04
JDK 1.8_20
elasticsearch 1.3.2
stunnel4

This config is for 2 node config.


NODE 1


Required config changes to elasticsearch.yml

# First bind elasticsearch to localhost (this makes es invisible to the 
outside world)
network.bind_host: 127.0.0.1
transport.tcp.port: 9300

# Since we are going to hide this node from the outside, we have to tell 
the rest of the nodes how he looks on the outside
network.publish_host: 
transport.publish_port: 9700

http.port: 9200

# Disable muslticast
discovery.zen.ping.multicast.enabled: false

# Since we are hiding all the nodes behind stunnel we also need to proxy es 
client requests through SSL. 
# For each additional node add 127.0.0.1:970x where x is incremented by 1 
I.e: 9702, 9703 etc...
# Connect to NODE 2
discovery.zen.ping.unicast.hosts: 127.0.0.1:9701

stunnel.conf on NODE 1

;Proxy ssl for tcp transport.
[es-trasnport]
accept = :9300
connect = 127.0.0.1:9300
cert = stunnel.pem

;Proxy ssl for http
[es-http]
accept = :9200
connect = 127.0.0.1:9200
cert = stunnel.pem

;ES clustering does some local discovery.
;Since stunnel binds it's own ports, we pick an arbitrary port that is not 
used by other "systems/protocols"
; See the publish settings of elasticsearch.yml above.
[es-transport-local]
client = yes
accept = :9700
connect = :9300

; The ssl client tunnel for es to connect ssl to node 2.
[es-transport-node2]
client = yes
accept = 127.0.0.1:9701
connect = :9301

;For each additional node increment x by 1, I.e: 9702, 9703 etc...
[es-transport-nodex]
client = yes
accept = 127.0.0.1:970x
connect = :930x


NODE 2


Required config changes to elasticsearch.yml

# First bind elasticsearch to localhost (this makes es invisible to the 
outside world)
network.bind_host: 127.0.0.1
transport.tcp.port: 9301

# Since we are going to hide this node from the outside, we have to tell 
the rest of the nodes how he looks on the outside
network.publish_host: 
transport.publish_port: 9701

http.port: 9200

# Disable muslticast
discovery.zen.ping.multicast.enabled: false

# Since we are hiding all the nodes behind stunnel we also need to proxy es 
client requests through SSL. 
# For each additional node add 127.0.0.1:970x where x is incremented by 1 
I.e: 9702, 9703 etc...
# Connect to NODE 1
discovery.zen.ping.unicast.hosts: 127.0.0.1:9700

stunnel.conf on NODE 2

;Proxy ssl for tcp transport.
[es-trasnport]
accept = :9301
connect = 127.0.0.1:9301
cert = stunnel.pem

;Proxy ssl for http
[es-http]
accept = :9200
connect = 127.0.0.1:9200
cert = stunnel.pem

;ES clustering does some local discovery.
;Since stunnel binds it's own ports, we pick an arbitrary port that is not 
used by other "systems/protocols"
; See the publish settings of elasticsearch.yml above.
[es-transport-local]
client = yes
accept = :9701
connect = :9301


; The ssl client tunnel for es to connect ssl to node 1.
[es-transport-node1]
client = yes
accept = 127.0.0.1:9700
connect = :9300

;For each additional node increment x by 1, I.e: 9702, 9703 etc...
[es-transport-nodex]
client = yes
accept = 127.0.0.1:970x
connect = :930x




-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f7e8f653-3f09-4a12-92c5-d5e0a54e7f1d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Trying to setup stunnel for es.

2014-08-21 Thread John Smith
Ok so i got it work on single node but i can't make it bridge for multinode 
using unicast

On server 1 i have...

Stunnel config...
[es-server-native]
accept = 10.0.0.xx0:9300
connect = 127.0.0.1:9300
cert = stunnel.pem

elasticsearch.yml
network.host: 127.0.0.1
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["what do i put here since now the nodes 
are not accessible externally"]


On server 2 i have...

Stunnel config...
[es-server-native]
accept = 10.0.0.xx1:9300
connect = 127.0.0.1:9300
cert = stunnel.pem

elasticsearch.yml
network.host: 127.0.0.1
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["what do i put here since now the nodes 
are not accessible externally"]

If I try to make the local es machine a client of itself. And start 
Elasticsearch I get bind address error and elasticsearch wont start.

On Server 1
[es-client-native]
client = yes
accept = 127.0.0.1:9300
connect = 10.0.0.xx1:9300

On Server 2
[es-client-native]
client = yes
accept = 127.0.0.1:9300
connect = 10.0.0.xx0:9300




On Thursday, 21 August 2014 10:53:01 UTC-4, John Smith wrote:
>
> I set network.host: 127.0.0.1
>
> And now both bind and publish host are bound to 127.0.0.1 and there is no 
> exceptions.
>
> Is that right?
>
> Now theoretically I should be able to stunnel my node client from another 
> boxing through the 9500 port?
>
>
> On Thursday, 21 August 2014 10:22:16 UTC-4, John Smith wrote:
>>
>> I'm running:
>>
>> Ubuntu 14.04
>> stunnel4
>> Elasticsearch 1.3.1
>>
>> In Elasticsearch.yml I bind ES to localhost only.
>>
>> network.bind_host: 127.0.0.1
>>
>> In my stunnel config...
>> client = no
>> [elasticsearch]
>> accept = 9600
>> connect = 127.0.0.1:9300
>> cert = /etc/stunnel/stunnel.pem
>>
>> Then I run elasticsearch. I have not tried to connect a client yet until 
>> I resolve the below exceptions...
>>
>> [2014-08-21 10:09:17,511][INFO ][node ] [Archenemy] 
>> version[1.3.0], pid[31396], build[1265b14/2014-07-23T13:46:36Z]
>> [2014-08-21 10:09:17,511][INFO ][node ] [Archenemy] 
>> initializing ...
>> [2014-08-21 10:09:17,519][INFO ][plugins  ] [Archenemy] 
>> loaded [marvel], sites [marvel, HQ]
>> [2014-08-21 10:09:20,088][INFO ][node ] [Archenemy] 
>> initialized
>> [2014-08-21 10:09:20,088][INFO ][node ] [Archenemy] 
>> starting ...
>> [2014-08-21 10:09:20,329][INFO ][transport] [Archenemy] 
>> bound_address {inet[/127.0.0.1:9300]}, publish_address 
>> {inet[/10.0.0.xxx:9300]}
>> [2014-08-21 10:09:20,346][INFO ][discovery] [Archenemy] 
>> esdashboard/OoIM73WsQYmQANs5Z7TlgQ
>> [2014-08-21 10:09:20,444][WARN ][cluster.service  ] [Archenemy] 
>> failed to connect to node 
>> [[Archenemy][OoIM73WsQYmQANs5Z7TlgQ][xx][inet[/10.0.0.xxx:9300]]]
>> org.elasticsearch.transport.ConnectTransportException: 
>> [Archenemy][inet[/10.0.0.xxx:9300]] connect_timeout[30s]
>> at 
>> org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:733)
>> at 
>> org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:662)
>> at 
>> org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:630)
>> at 
>> org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:146)
>> at 
>> org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:424)
>> at 
>> org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.net.ConnectException: Connection refused: /10.0.0.xxx:9300
>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>> at 
>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:712)
>> at 
>> org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:150)
>> at 
>> org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
>> at 
>> org.elasticsearch.common.netty.channel.socket.nio.N

Re: Trying to setup stunnel for es.

2014-08-21 Thread John Smith
I set network.host: 127.0.0.1

And now both bind and publish host are bound to 127.0.0.1 and there is no 
exceptions.

Is that right?

Now theoretically I should be able to stunnel my node client from another 
boxing through the 9500 port?


On Thursday, 21 August 2014 10:22:16 UTC-4, John Smith wrote:
>
> I'm running:
>
> Ubuntu 14.04
> stunnel4
> Elasticsearch 1.3.1
>
> In Elasticsearch.yml I bind ES to localhost only.
>
> network.bind_host: 127.0.0.1
>
> In my stunnel config...
> client = no
> [elasticsearch]
> accept = 9600
> connect = 127.0.0.1:9300
> cert = /etc/stunnel/stunnel.pem
>
> Then I run elasticsearch. I have not tried to connect a client yet until I 
> resolve the below exceptions...
>
> [2014-08-21 10:09:17,511][INFO ][node ] [Archenemy] 
> version[1.3.0], pid[31396], build[1265b14/2014-07-23T13:46:36Z]
> [2014-08-21 10:09:17,511][INFO ][node ] [Archenemy] 
> initializing ...
> [2014-08-21 10:09:17,519][INFO ][plugins  ] [Archenemy] 
> loaded [marvel], sites [marvel, HQ]
> [2014-08-21 10:09:20,088][INFO ][node ] [Archenemy] 
> initialized
> [2014-08-21 10:09:20,088][INFO ][node ] [Archenemy] 
> starting ...
> [2014-08-21 10:09:20,329][INFO ][transport] [Archenemy] 
> bound_address {inet[/127.0.0.1:9300]}, publish_address 
> {inet[/10.0.0.xxx:9300]}
> [2014-08-21 10:09:20,346][INFO ][discovery] [Archenemy] 
> esdashboard/OoIM73WsQYmQANs5Z7TlgQ
> [2014-08-21 10:09:20,444][WARN ][cluster.service  ] [Archenemy] 
> failed to connect to node 
> [[Archenemy][OoIM73WsQYmQANs5Z7TlgQ][xx][inet[/10.0.0.xxx:9300]]]
> org.elasticsearch.transport.ConnectTransportException: 
> [Archenemy][inet[/10.0.0.xxx:9300]] connect_timeout[30s]
> at 
> org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:733)
> at 
> org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:662)
> at 
> org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:630)
> at 
> org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:146)
> at 
> org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:424)
> at 
> org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.ConnectException: Connection refused: /10.0.0.xxx:9300
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:712)
> at 
> org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:150)
> at 
> org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
> at 
> org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
> at 
> org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
> at 
> org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
> at 
> org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
> at 
> org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
> ... 3 more
> [2014-08-21 10:09:23,357][INFO ][cluster.service  ] [Archenemy] 
> new_master 
> [Archenemy][OoIM73WsQYmQANs5Z7TlgQ][xx][inet[/10.0.0.xxx:9300]], 
> reason: zen-disco-join (elected_as_master)
> [2014-08-21 10:09:23,481][INFO ][http ] [Archenemy] 
> bound_address {inet[/127.0.0.1:9200]}, publish_address 
> {inet[/10.0.0.xxx:9200]}
> [2014-08-21 10:09:23,482][INFO ][node ] [Archenemy] 
> started
> [2014-08-21 10:09:24,128][INFO ][gateway  ] [Archenemy] 
> recovered [33] indices into cluster_state
> [2014-08-21 10:10:20,158][WARN ][cluster.service  ] [Archenemy] 
> failed to reconnect to node 
> [Archenemy][OoIM73WsQYmQANs5Z7TlgQ][xx][inet[
> xx.xx.net/10.0.0.xxx:9300]]
> org.elasticsearch.transport.ConnectTransportException: [Archenemy][inet[
> xx.xx.net/10.0.0.xxx:9300]] connect_timeout[30s]
>

Trying to setup stunnel for es.

2014-08-21 Thread John Smith
I'm running:

Ubuntu 14.04
stunnel4
Elasticsearch 1.3.1

In Elasticsearch.yml I bind ES to localhost only.

network.bind_host: 127.0.0.1

In my stunnel config...
client = no
[elasticsearch]
accept = 9600
connect = 127.0.0.1:9300
cert = /etc/stunnel/stunnel.pem

Then I run elasticsearch. I have not tried to connect a client yet until I 
resolve the below exceptions...

[2014-08-21 10:09:17,511][INFO ][node ] [Archenemy] 
version[1.3.0], pid[31396], build[1265b14/2014-07-23T13:46:36Z]
[2014-08-21 10:09:17,511][INFO ][node ] [Archenemy] 
initializing ...
[2014-08-21 10:09:17,519][INFO ][plugins  ] [Archenemy] 
loaded [marvel], sites [marvel, HQ]
[2014-08-21 10:09:20,088][INFO ][node ] [Archenemy] 
initialized
[2014-08-21 10:09:20,088][INFO ][node ] [Archenemy] 
starting ...
[2014-08-21 10:09:20,329][INFO ][transport] [Archenemy] 
bound_address {inet[/127.0.0.1:9300]}, publish_address 
{inet[/10.0.0.xxx:9300]}
[2014-08-21 10:09:20,346][INFO ][discovery] [Archenemy] 
esdashboard/OoIM73WsQYmQANs5Z7TlgQ
[2014-08-21 10:09:20,444][WARN ][cluster.service  ] [Archenemy] 
failed to connect to node 
[[Archenemy][OoIM73WsQYmQANs5Z7TlgQ][xx][inet[/10.0.0.xxx:9300]]]
org.elasticsearch.transport.ConnectTransportException: 
[Archenemy][inet[/10.0.0.xxx:9300]] connect_timeout[30s]
at 
org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:733)
at 
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:662)
at 
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:630)
at 
org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:146)
at 
org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:424)
at 
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /10.0.0.xxx:9300
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:712)
at 
org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:150)
at 
org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
at 
org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
at 
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at 
org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at 
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at 
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
... 3 more
[2014-08-21 10:09:23,357][INFO ][cluster.service  ] [Archenemy] 
new_master 
[Archenemy][OoIM73WsQYmQANs5Z7TlgQ][xx][inet[/10.0.0.xxx:9300]], 
reason: zen-disco-join (elected_as_master)
[2014-08-21 10:09:23,481][INFO ][http ] [Archenemy] 
bound_address {inet[/127.0.0.1:9200]}, publish_address 
{inet[/10.0.0.xxx:9200]}
[2014-08-21 10:09:23,482][INFO ][node ] [Archenemy] 
started
[2014-08-21 10:09:24,128][INFO ][gateway  ] [Archenemy] 
recovered [33] indices into cluster_state
[2014-08-21 10:10:20,158][WARN ][cluster.service  ] [Archenemy] 
failed to reconnect to node 
[Archenemy][OoIM73WsQYmQANs5Z7TlgQ][xx][inet[xx.xx.net/10.0.0.xxx:9300]]
org.elasticsearch.transport.ConnectTransportException: 
[Archenemy][inet[xx.xx.net/10.0.0.xxx:9300]] connect_timeout[30s]
at 
org.elasticsearch.transport.netty.NettyTransport.connectToChannels(NettyTransport.java:733)
at 
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:662)
at 
org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:630)
at 
org.elasticsearch.transport.TransportService.connectToNode(TransportService.java:146)
at 
org.elasticsearch.cluster.service.InternalClusterService$ReconnectToNodes.run(InternalClusterService.java:537)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.

Re: Any reason for this package org.elasticsearch.comon.netty.*?

2014-08-19 Thread John Smith
You mean netty? But why would regular netty pipeline need to be custom? I
asjed cause I wanted to tacle writting cystom tcp transport
On Aug 19, 2014 8:03 PM, "Ivan Brusic"  wrote:

> At one point Elasticsearch shaded several different libraries for various
> reasons, but thankfully this is no longer the case. From what I understand,
> the Jetty classes you are referring to are custom classes built for
> Elasticsearch that are not packaged with Jetty.
>
> Cheers,
>
> Ivan
>
>
> On Tue, Aug 19, 2014 at 6:48 AM, John Smith 
> wrote:
>
>> Look at the latest POM ES depends on Netty 3.9.1 all said and good...
>>
>> But why is netty packaged within the Elasticsearch jar and not
>> distributed as a regular jar in the lib folder?
>>
>> Is ES maintaning it's own version of Netty or it is just to reduce the
>> amount of jars distributed?
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/06a986e7-9d71-41fe-8d80-262d137baef2%40googlegroups.com
>> <https://groups.google.com/d/msgid/elasticsearch/06a986e7-9d71-41fe-8d80-262d137baef2%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "elasticsearch" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/elasticsearch/8XF6MhtDiZM/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCT%3DAUQFafvL4UpYqfDXB%2B9dyba6JKxzb5grcuOj9qUjA%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCT%3DAUQFafvL4UpYqfDXB%2B9dyba6JKxzb5grcuOj9qUjA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAMiEuFQFSbZgH10VE%3Dj%3DmvM-nu%2BR%2BqE2zHKkujPBfooS1eTvDA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Any reason for this package org.elasticsearch.comon.netty.*?

2014-08-19 Thread John Smith
Look at the latest POM ES depends on Netty 3.9.1 all said and good...

But why is netty packaged within the Elasticsearch jar and not distributed 
as a regular jar in the lib folder?

Is ES maintaning it's own version of Netty or it is just to reduce the 
amount of jars distributed?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/06a986e7-9d71-41fe-8d80-262d137baef2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Can plugin be written for TCP transport?

2014-08-15 Thread John Smith
Of course understood.

I think subnetting/double firewall.

DMZ will not have access to 9300 and internal network will not have access 
to 9300 only subnet for ES cluster functions.


On Friday, 15 August 2014 05:04:35 UTC-4, Jörg Prante wrote:
>
> You can not protect from superuser access from within an app, except when 
> you are also a superuser, and can create obscure kernel capabilities to 
> protect an app from another superuser.
>
> Usually this is not solvable by technical solutions, it is a matter of 
> trust into the infrastructure (data center, network, staff).
>
> Jörg
>
>
>
> On Fri, Aug 15, 2014 at 12:03 AM, John Smith  > wrote:
>
>> Yeah I tried the foundit plugin and just striped away all their code and 
>> it's just a pass through! But would def need to figure out the inards.
>>
>> We have web app in dmz and then ES at lower level. But again it's about 
>> protecting from the inside. When having a bunch of support staff that are 
>> capable of accessing the box for admin purposes and support... My admin guy 
>> is a smart dude and he understands you can only go so far! We just have to 
>> see if it's acceptable.
>>
>>
>> On Thursday, 14 August 2014 14:54:51 UTC-4, Jörg Prante wrote:
>>
>>> Yes, you can reimplement the transport layer. The config setting runs 
>>> beyond the transport layer and if you get into a mood you could even throw 
>>> port 9300 and all the netty code away.  Not that I recommend it - but this 
>>> is a very low-level, dark magic intrusion into the ES guts. You can break 
>>> everything into pieces, including breaking the discovery basics etc.
>>>
>>> Jörg
>>>
>>>
>>> On Wed, Aug 13, 2014 at 9:52 PM, John Smith  wrote:
>>>
>>>> I think what allot of people is more like this? If I understand 
>>>> correctly... This runs above the 9300 transport. So you still get all the 
>>>> goodies/zen discovery etc... Plus added authentication. Except this only 
>>>> works with found...
>>>>
>>>>
>>>>
>>>> On Wednesday, 13 August 2014 15:23:05 UTC-4, John Smith wrote:
>>>>>
>>>>> Hi thanks Jorg. 
>>>>>
>>>>> Just to be sure, creating a plugin and overriding: transport.type and 
>>>>> transport.service.type this allow us to create custom transport for TCP 
>>>>> 9300 transport or the HTTP 9200 transport?
>>>>>
>>>>>
>>>>> I don't have a problem with using 
>>>>>
>>>>> Subnet and firewall
>>>>> And
>>>>> Nginx or specified plugin.
>>>>> Or
>>>>> Custom application
>>>>>
>>>>> To secure ES from outside world. In fact that's pretty good.
>>>>>
>>>>> We run a web layer in DMZ and ES on a second layer. So only DMZ has 
>>>>> access to ES.
>>>>>
>>>>> The question is how do you secure ES from the inside world? Anyone who 
>>>>> has access to the subnet and to the ES cluster can just login and install 
>>>>> their own "jobs" that can use port 9300 and do what ever they want with 
>>>>> ES. 
>>>>> Even a simple authentication would be better then nothing here.
>>>>>
>>>>> If what you said above allows us to replace 9300 transport, then 
>>>>> awesome. The issue is Elasticsearch has built the really nice fast Ferari 
>>>>> but opted out of providing even a simple door locking mechanism. Using 
>>>>> nginx is like telling your neighbor to keep an eye on your car while you 
>>>>> are away on the weekend and trusting they will do right.
>>>>>
>>>>> On Wednesday, 13 August 2014 14:21:01 UTC-4, Jörg Prante wrote:
>>>>>>
>>>>>> With the configuration settings "transport.type" and 
>>>>>> "transport.service.type"
>>>>>>
>>>>>> Example:
>>>>>>
>>>>>> transport.type: org.xbib.elasticsearch.transport.syslog.netty.
>>>>>> SyslogNettyTransportModule
>>>>>> transport.service.type: org.xbib.elasticsearch.transport.syslog.
>>>>>> SyslogTransportService
>>>>>>
>>>>>> you can implement your own transport layer.
>>>>>>
>>>>>> With this mechanism you can write plugins to create an audit trail of 
>

Re: Can plugin be written for TCP transport?

2014-08-14 Thread John Smith
Yeah I tried the foundit plugin and just striped away all their code and 
it's just a pass through! But would def need to figure out the inards.

We have web app in dmz and then ES at lower level. But again it's about 
protecting from the inside. When having a bunch of support staff that are 
capable of accessing the box for admin purposes and support... My admin guy 
is a smart dude and he understands you can only go so far! We just have to 
see if it's acceptable.


On Thursday, 14 August 2014 14:54:51 UTC-4, Jörg Prante wrote:
>
> Yes, you can reimplement the transport layer. The config setting runs 
> beyond the transport layer and if you get into a mood you could even throw 
> port 9300 and all the netty code away.  Not that I recommend it - but this 
> is a very low-level, dark magic intrusion into the ES guts. You can break 
> everything into pieces, including breaking the discovery basics etc.
>
> Jörg
>
>
> On Wed, Aug 13, 2014 at 9:52 PM, John Smith  > wrote:
>
>> I think what allot of people is more like this? If I understand 
>> correctly... This runs above the 9300 transport. So you still get all the 
>> goodies/zen discovery etc... Plus added authentication. Except this only 
>> works with found...
>>
>>
>>
>> On Wednesday, 13 August 2014 15:23:05 UTC-4, John Smith wrote:
>>>
>>> Hi thanks Jorg. 
>>>
>>> Just to be sure, creating a plugin and overriding: transport.type and 
>>> transport.service.type this allow us to create custom transport for TCP 
>>> 9300 transport or the HTTP 9200 transport?
>>>
>>>
>>> I don't have a problem with using 
>>>
>>> Subnet and firewall
>>> And
>>> Nginx or specified plugin.
>>> Or
>>> Custom application
>>>
>>> To secure ES from outside world. In fact that's pretty good.
>>>
>>> We run a web layer in DMZ and ES on a second layer. So only DMZ has 
>>> access to ES.
>>>
>>> The question is how do you secure ES from the inside world? Anyone who 
>>> has access to the subnet and to the ES cluster can just login and install 
>>> their own "jobs" that can use port 9300 and do what ever they want with ES. 
>>> Even a simple authentication would be better then nothing here.
>>>
>>> If what you said above allows us to replace 9300 transport, then 
>>> awesome. The issue is Elasticsearch has built the really nice fast Ferari 
>>> but opted out of providing even a simple door locking mechanism. Using 
>>> nginx is like telling your neighbor to keep an eye on your car while you 
>>> are away on the weekend and trusting they will do right.
>>>
>>> On Wednesday, 13 August 2014 14:21:01 UTC-4, Jörg Prante wrote:
>>>>
>>>> With the configuration settings "transport.type" and 
>>>> "transport.service.type"
>>>>
>>>> Example:
>>>>
>>>> transport.type: org.xbib.elasticsearch.transport.syslog.netty.
>>>> SyslogNettyTransportModule
>>>> transport.service.type: org.xbib.elasticsearch.transport.syslog.
>>>> SyslogTransportService
>>>>
>>>> you can implement your own transport layer.
>>>>
>>>> With this mechanism you can write plugins to create an audit trail of 
>>>> all clients or nodes that connect to your cluster and you can log what 
>>>> they 
>>>> did, for later revision. Or, you can add a registry for clients, add JAAS, 
>>>> ...
>>>>
>>>> For example, I played with a modified netty transport layer to dump all 
>>>> parsed transport actions between nodes to a remote host syslog, including 
>>>> the action names and the channel connection information of local and 
>>>> remote 
>>>> host/port.
>>>>
>>>> On such a custom transport layer implementation, you can add even more 
>>>> low level logic. If you do not want certain nodes or clients to connect, 
>>>> you could a) use zen unicast for manual configuration of permitted nodes 
>>>> or 
>>>> clients and/or b) reject all network actions from unknown/unregistered 
>>>> clients, independent of discovery.
>>>>
>>>> Note, manipulating transport layer is not free lunch and is not always 
>>>> fun. Performance may degrade, other things may break etc.
>>>>
>>>> Jörg
>>>>
>>>>
>>>>
>>>> On Wed, Aug 13, 2014 at 5:07 PM, John Smith  

Difference between transport and http modules?

2014-08-14 Thread John Smith
Just want to clear some thing up...

The transport module is what is called the tcp transport module that runs 
on port 9300?
The transport module is what handles all major networking function of ES. 
I.e: any network functions like http, zen discovery sit on top of transport 
module?
The http module runs on port 9200?
The http module uses the transport module underneath?
The Java Client APIs use tcp transport or http?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5c30fb31-f4f5-443f-b47e-f233b57fd0b8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Can plugin be written for TCP transport?

2014-08-13 Thread John Smith
I think what allot of people is more like this? If I understand 
correctly... This runs above the 9300 transport. So you still get all the 
goodies/zen discovery etc... Plus added authentication. Except this only 
works with found...



On Wednesday, 13 August 2014 15:23:05 UTC-4, John Smith wrote:
>
> Hi thanks Jorg. 
>
> Just to be sure, creating a plugin and overriding: transport.type and 
> transport.service.type this allow us to create custom transport for TCP 
> 9300 transport or the HTTP 9200 transport?
>
>
> I don't have a problem with using 
>
> Subnet and firewall
> And
> Nginx or specified plugin.
> Or
> Custom application
>
> To secure ES from outside world. In fact that's pretty good.
>
> We run a web layer in DMZ and ES on a second layer. So only DMZ has access 
> to ES.
>
> The question is how do you secure ES from the inside world? Anyone who has 
> access to the subnet and to the ES cluster can just login and install their 
> own "jobs" that can use port 9300 and do what ever they want with ES. Even 
> a simple authentication would be better then nothing here.
>
> If what you said above allows us to replace 9300 transport, then awesome. 
> The issue is Elasticsearch has built the really nice fast Ferari but opted 
> out of providing even a simple door locking mechanism. Using nginx is like 
> telling your neighbor to keep an eye on your car while you are away on the 
> weekend and trusting they will do right.
>
> On Wednesday, 13 August 2014 14:21:01 UTC-4, Jörg Prante wrote:
>>
>> With the configuration settings "transport.type" and 
>> "transport.service.type"
>>
>> Example:
>>
>> transport.type: 
>> org.xbib.elasticsearch.transport.syslog.netty.SyslogNettyTransportModule
>> transport.service.type: 
>> org.xbib.elasticsearch.transport.syslog.SyslogTransportService
>>
>> you can implement your own transport layer.
>>
>> With this mechanism you can write plugins to create an audit trail of all 
>> clients or nodes that connect to your cluster and you can log what they 
>> did, for later revision. Or, you can add a registry for clients, add JAAS, 
>> ...
>>
>> For example, I played with a modified netty transport layer to dump all 
>> parsed transport actions between nodes to a remote host syslog, including 
>> the action names and the channel connection information of local and remote 
>> host/port.
>>
>> On such a custom transport layer implementation, you can add even more 
>> low level logic. If you do not want certain nodes or clients to connect, 
>> you could a) use zen unicast for manual configuration of permitted nodes or 
>> clients and/or b) reject all network actions from unknown/unregistered 
>> clients, independent of discovery.
>>
>> Note, manipulating transport layer is not free lunch and is not always 
>> fun. Performance may degrade, other things may break etc.
>>
>> Jörg
>>
>>
>>
>> On Wed, Aug 13, 2014 at 5:07 PM, John Smith  wrote:
>>
>>> That's what I was thinking...
>>>
>>> 1- I would like this java app to use the node client, cause I like that 
>>> fact that there no extra hop and automatic failover to next node.
>>> 2- I figure it would be a firewall setting/socks to only allow the java 
>>> app to connect to ES. But again here anyone can go create a node client on 
>>> the same machine and pull data anonymously.
>>>
>>> I know any one person can log in a machine at any time and any person 
>>> can read regardless and it's ok, the data is supposed to be read but at 
>>> least you know who read it and when. That's not an issue... Security is a 
>>> best effort, but the issue is the audit process and how well you can check 
>>> if all your eggs are there.
>>>
>>> Even if I do exactly as you said, subnet plus socks proxy, someone can 
>>> still go to that machine create their own node client and bypass the java 
>>> app with no direct trace. This will probably never happen, but all it takes 
>>> is one angry employ.
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Wednesday, 13 August 2014 09:50:25 UTC-4, Jörg Prante wrote:
>>>
>>>> You can write a Java app to authorize access with JAAS and use a SOCKS 
>>>> proxy to connect to an ES cluster in a private subnet. That is all a 
>>>> matter 
>>>> of network configuration, there is nothing that requires the effort of an 
>>>> extra ES plugin.
>>>>
>>>> 

Re: Can plugin be written for TCP transport?

2014-08-13 Thread John Smith
Hi thanks Jorg. 

Just to be sure, creating a plugin and overriding: transport.type and 
transport.service.type this allow us to create custom transport for TCP 
9300 transport or the HTTP 9200 transport?


I don't have a problem with using 

Subnet and firewall
And
Nginx or specified plugin.
Or
Custom application

To secure ES from outside world. In fact that's pretty good.

We run a web layer in DMZ and ES on a second layer. So only DMZ has access 
to ES.

The question is how do you secure ES from the inside world? Anyone who has 
access to the subnet and to the ES cluster can just login and install their 
own "jobs" that can use port 9300 and do what ever they want with ES. Even 
a simple authentication would be better then nothing here.

If what you said above allows us to replace 9300 transport, then awesome. 
The issue is Elasticsearch has built the really nice fast Ferari but opted 
out of providing even a simple door locking mechanism. Using nginx is like 
telling your neighbor to keep an eye on your car while you are away on the 
weekend and trusting they will do right.

On Wednesday, 13 August 2014 14:21:01 UTC-4, Jörg Prante wrote:
>
> With the configuration settings "transport.type" and 
> "transport.service.type"
>
> Example:
>
> transport.type: 
> org.xbib.elasticsearch.transport.syslog.netty.SyslogNettyTransportModule
> transport.service.type: 
> org.xbib.elasticsearch.transport.syslog.SyslogTransportService
>
> you can implement your own transport layer.
>
> With this mechanism you can write plugins to create an audit trail of all 
> clients or nodes that connect to your cluster and you can log what they 
> did, for later revision. Or, you can add a registry for clients, add JAAS, 
> ...
>
> For example, I played with a modified netty transport layer to dump all 
> parsed transport actions between nodes to a remote host syslog, including 
> the action names and the channel connection information of local and remote 
> host/port.
>
> On such a custom transport layer implementation, you can add even more low 
> level logic. If you do not want certain nodes or clients to connect, you 
> could a) use zen unicast for manual configuration of permitted nodes or 
> clients and/or b) reject all network actions from unknown/unregistered 
> clients, independent of discovery.
>
> Note, manipulating transport layer is not free lunch and is not always 
> fun. Performance may degrade, other things may break etc.
>
> Jörg
>
>
>
> On Wed, Aug 13, 2014 at 5:07 PM, John Smith  > wrote:
>
>> That's what I was thinking...
>>
>> 1- I would like this java app to use the node client, cause I like that 
>> fact that there no extra hop and automatic failover to next node.
>> 2- I figure it would be a firewall setting/socks to only allow the java 
>> app to connect to ES. But again here anyone can go create a node client on 
>> the same machine and pull data anonymously.
>>
>> I know any one person can log in a machine at any time and any person can 
>> read regardless and it's ok, the data is supposed to be read but at least 
>> you know who read it and when. That's not an issue... Security is a best 
>> effort, but the issue is the audit process and how well you can check if 
>> all your eggs are there.
>>
>> Even if I do exactly as you said, subnet plus socks proxy, someone can 
>> still go to that machine create their own node client and bypass the java 
>> app with no direct trace. This will probably never happen, but all it takes 
>> is one angry employ.
>>
>>
>>
>>
>>
>>
>> On Wednesday, 13 August 2014 09:50:25 UTC-4, Jörg Prante wrote:
>>
>>> You can write a Java app to authorize access with JAAS and use a SOCKS 
>>> proxy to connect to an ES cluster in a private subnet. That is all a matter 
>>> of network configuration, there is nothing that requires the effort of an 
>>> extra ES plugin.
>>>
>>> Jörg
>>>
>>>
>>> On Wed, Aug 13, 2014 at 3:38 PM, John Smith  wrote:
>>>
>>>> Hi I have been looking at the various transport plugins. Correct me if 
>>>> I am wrong but those are for the http rest interface... Can plugins be 
>>>> written for the node transport?
>>>>
>>>> Bassically this leads to securing ES. My ES is definitely not public 
>>>> and I know i can use reverse proxies or one of the http plugins... But 
>>>> what 
>>>> about client/programs connecting directly as nodes?
>>>>
>>>> Bassically I need user auth and some form of acl. SSL is secondary. 
>>>

Re: Can plugin be written for TCP transport?

2014-08-13 Thread John Smith
It would be cool if we could use something like Jetty plugin and disable 
client access from the TCP node and only use the TCP node for cluster 
related work.

On Wednesday, 13 August 2014 11:07:45 UTC-4, John Smith wrote:
>
> That's what I was thinking...
>
> 1- I would like this java app to use the node client, cause I like that 
> fact that there no extra hop and automatic failover to next node.
> 2- I figure it would be a firewall setting/socks to only allow the java 
> app to connect to ES. But again here anyone can go create a node client on 
> the same machine and pull data anonymously.
>
> I know any one person can log in a machine at any time and any person can 
> read regardless and it's ok, the data is supposed to be read but at least 
> you know who read it and when. That's not an issue... Security is a best 
> effort, but the issue is the audit process and how well you can check if 
> all your eggs are there.
>
> Even if I do exactly as you said, subnet plus socks proxy, someone can 
> still go to that machine create their own node client and bypass the java 
> app with no direct trace. This will probably never happen, but all it takes 
> is one angry employ.
>
>
>
>
>
>
> On Wednesday, 13 August 2014 09:50:25 UTC-4, Jörg Prante wrote:
>>
>> You can write a Java app to authorize access with JAAS and use a SOCKS 
>> proxy to connect to an ES cluster in a private subnet. That is all a matter 
>> of network configuration, there is nothing that requires the effort of an 
>> extra ES plugin.
>>
>> Jörg
>>
>>
>> On Wed, Aug 13, 2014 at 3:38 PM, John Smith  wrote:
>>
>>> Hi I have been looking at the various transport plugins. Correct me if I 
>>> am wrong but those are for the http rest interface... Can plugins be 
>>> written for the node transport?
>>>
>>> Bassically this leads to securing ES. My ES is definitely not public and 
>>> I know i can use reverse proxies or one of the http plugins... But what 
>>> about client/programs connecting directly as nodes?
>>>
>>> Bassically I need user auth and some form of acl. SSL is secondary. Also 
>>> need to be able to audit the user access. Dealing with credit card data. So 
>>> I need to know 100% who is accessing the data.
>>>
>>> So...
>>> What are some good steps to secure my ES cluster!?
>>>
>>> --
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/1ef17a07-bd72-4eee-a6b9-93ff8d0e7980%40googlegroups.com
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c7a7d241-575e-480c-afe9-e2ef69043160%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Can plugin be written for TCP transport?

2014-08-13 Thread John Smith
That's what I was thinking...

1- I would like this java app to use the node client, cause I like that 
fact that there no extra hop and automatic failover to next node.
2- I figure it would be a firewall setting/socks to only allow the java app 
to connect to ES. But again here anyone can go create a node client on the 
same machine and pull data anonymously.

I know any one person can log in a machine at any time and any person can 
read regardless and it's ok, the data is supposed to be read but at least 
you know who read it and when. That's not an issue... Security is a best 
effort, but the issue is the audit process and how well you can check if 
all your eggs are there.

Even if I do exactly as you said, subnet plus socks proxy, someone can 
still go to that machine create their own node client and bypass the java 
app with no direct trace. This will probably never happen, but all it takes 
is one angry employ.






On Wednesday, 13 August 2014 09:50:25 UTC-4, Jörg Prante wrote:
>
> You can write a Java app to authorize access with JAAS and use a SOCKS 
> proxy to connect to an ES cluster in a private subnet. That is all a matter 
> of network configuration, there is nothing that requires the effort of an 
> extra ES plugin.
>
> Jörg
>
>
> On Wed, Aug 13, 2014 at 3:38 PM, John Smith  > wrote:
>
>> Hi I have been looking at the various transport plugins. Correct me if I 
>> am wrong but those are for the http rest interface... Can plugins be 
>> written for the node transport?
>>
>> Bassically this leads to securing ES. My ES is definitely not public and 
>> I know i can use reverse proxies or one of the http plugins... But what 
>> about client/programs connecting directly as nodes?
>>
>> Bassically I need user auth and some form of acl. SSL is secondary. Also 
>> need to be able to audit the user access. Dealing with credit card data. So 
>> I need to know 100% who is accessing the data.
>>
>> So...
>> What are some good steps to secure my ES cluster!?
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/1ef17a07-bd72-4eee-a6b9-93ff8d0e7980%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/648c4a70-ff8b-43c8-b818-4f942a02daf5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Can plugin be written for TCP transport?

2014-08-13 Thread John Smith
Hi I have been looking at the various transport plugins. Correct me if I am 
wrong but those are for the http rest interface... Can plugins be written for 
the node transport?

Bassically this leads to securing ES. My ES is definitely not public and I know 
i can use reverse proxies or one of the http plugins... But what about 
client/programs connecting directly as nodes?

Bassically I need user auth and some form of acl. SSL is secondary. Also need 
to be able to audit the user access. Dealing with credit card data. So I need 
to know 100% who is accessing the data.

So...
What are some good steps to secure my ES cluster!?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/1ef17a07-bd72-4eee-a6b9-93ff8d0e7980%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Weird Exception.

2014-08-11 Thread John Smith
Ok set my sysctl and user limits to 65536. Node restarted seems to be 
recovering so far...

On Monday, 11 August 2014 09:37:33 UTC-4, John Smith wrote:
>
> Oops switched linux boxes and forgot to set the file limit. Let me see if 
> that works :)
>
> On Monday, 11 August 2014 09:33:54 UTC-4, John Smith wrote:
>>
>> And I see this also...
>>
>> [2014-08-11 09:31:35,233][WARN ][cluster.action.shard ] [Scarlet 
>> Spiders] [.marvel-2014.08.04][0] sending failed shard for 
>> [.marvel-2014.08.04][0], node[TsoETYSERg-DNDpiDxpxKA], [R], 
>> s[INITIALIZING], indexUUID [gxfk0pCiQg2QCJRyUWAYTw], reason [engine 
>> failure, message [corrupted preexisting 
>> index][FileSystemException[/home/elasticsearch/elasticsearch-1.3.0/data/esdashboard/nodes/0/indices/.marvel-2014.08.04/0/index/_
>> 2bf.si: Too many open files]]]
>> [2014-08-11 09:31:35,393][WARN ][index.engine.internal] [Scarlet 
>> Spiders] [.marvel-2014.07.31][0] failed engine [corrupted preexisting index]
>> [2014-08-11 09:31:35,394][WARN ][indices.cluster  ] [Scarlet 
>> Spiders] [.marvel-2014.07.31][0] failed to start shard
>> java.nio.file.FileSystemException: 
>> /home/elasticsearch/elasticsearch-1.3.0/data/esdashboard/nodes/0/indices/.marvel-2014.07.31/0/index/_
>> 3ns.si: Too many open files
>> at 
>> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
>> at 
>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>> at 
>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>> at 
>> sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
>> at java.nio.channels.FileChannel.open(FileChannel.java:287)
>> at java.nio.channels.FileChannel.open(FileChannel.java:335)
>> at 
>> org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:81)
>> at 
>> org.apache.lucene.store.FileSwitchDirectory.openInput(FileSwitchDirectory.java:172)
>> at 
>> org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:80)
>> at 
>> org.elasticsearch.index.store.DistributorDirectory.openInput(DistributorDirectory.java:130)
>> at 
>> org.apache.lucene.store.Directory.openChecksumInput(Directory.java:113)
>> at 
>> org.apache.lucene.codecs.lucene46.Lucene46SegmentInfoReader.read(Lucene46SegmentInfoReader.java:49)
>> at 
>> org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:361)
>> at 
>> org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457)
>> at 
>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:907)
>> at 
>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:753)
>> at 
>> org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453)
>> at 
>> org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:96)
>> at 
>> org.elasticsearch.index.store.Store.readLastCommittedSegmentsInfo(Store.java:124)
>> at org.elasticsearch.index.store.Store.access$300(Store.java:74)
>> at 
>> org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:442)
>> at 
>> org.elasticsearch.index.store.Store$MetadataSnapshot.(Store.java:433)
>> at org.elasticsearch.index.store.Store.getMetadata(Store.java:144)
>> at 
>> org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:724)
>> at 
>> org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:576)
>> at 
>> org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:183)
>> at 
>> org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:444)
>> at 
>> org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> [2014-08-11 09:31:35,397][WARN ][cluster.action.shard ] [Scarlet 
>> Spiders] [.marvel-2014.07.31][0] sending failed shard for 
>> [.marvel-2014.07.31][0], node[TsoETYSERg-DNDpiD

Re: Weird Exception.

2014-08-11 Thread John Smith
Ok set my sysctl and user limits to 65536

On Monday, 11 August 2014 09:37:33 UTC-4, John Smith wrote:
>
> Oops switched linux boxes and forgot to set the file limit. Let me see if 
> that works :)
>
> On Monday, 11 August 2014 09:33:54 UTC-4, John Smith wrote:
>>
>> And I see this also...
>>
>> [2014-08-11 09:31:35,233][WARN ][cluster.action.shard ] [Scarlet 
>> Spiders] [.marvel-2014.08.04][0] sending failed shard for 
>> [.marvel-2014.08.04][0], node[TsoETYSERg-DNDpiDxpxKA], [R], 
>> s[INITIALIZING], indexUUID [gxfk0pCiQg2QCJRyUWAYTw], reason [engine 
>> failure, message [corrupted preexisting 
>> index][FileSystemException[/home/elasticsearch/elasticsearch-1.3.0/data/esdashboard/nodes/0/indices/.marvel-2014.08.04/0/index/_
>> 2bf.si: Too many open files]]]
>> [2014-08-11 09:31:35,393][WARN ][index.engine.internal] [Scarlet 
>> Spiders] [.marvel-2014.07.31][0] failed engine [corrupted preexisting index]
>> [2014-08-11 09:31:35,394][WARN ][indices.cluster  ] [Scarlet 
>> Spiders] [.marvel-2014.07.31][0] failed to start shard
>> java.nio.file.FileSystemException: 
>> /home/elasticsearch/elasticsearch-1.3.0/data/esdashboard/nodes/0/indices/.marvel-2014.07.31/0/index/_
>> 3ns.si: Too many open files
>> at 
>> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
>> at 
>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>> at 
>> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>> at 
>> sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
>> at java.nio.channels.FileChannel.open(FileChannel.java:287)
>> at java.nio.channels.FileChannel.open(FileChannel.java:335)
>> at 
>> org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:81)
>> at 
>> org.apache.lucene.store.FileSwitchDirectory.openInput(FileSwitchDirectory.java:172)
>> at 
>> org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:80)
>> at 
>> org.elasticsearch.index.store.DistributorDirectory.openInput(DistributorDirectory.java:130)
>> at 
>> org.apache.lucene.store.Directory.openChecksumInput(Directory.java:113)
>> at 
>> org.apache.lucene.codecs.lucene46.Lucene46SegmentInfoReader.read(Lucene46SegmentInfoReader.java:49)
>> at 
>> org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:361)
>> at 
>> org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457)
>> at 
>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:907)
>> at 
>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:753)
>> at 
>> org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453)
>> at 
>> org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:96)
>> at 
>> org.elasticsearch.index.store.Store.readLastCommittedSegmentsInfo(Store.java:124)
>> at org.elasticsearch.index.store.Store.access$300(Store.java:74)
>> at 
>> org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:442)
>> at 
>> org.elasticsearch.index.store.Store$MetadataSnapshot.(Store.java:433)
>> at org.elasticsearch.index.store.Store.getMetadata(Store.java:144)
>> at 
>> org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:724)
>> at 
>> org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:576)
>> at 
>> org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:183)
>> at 
>> org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:444)
>> at 
>> org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
>> at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> [2014-08-11 09:31:35,397][WARN ][cluster.action.shard ] [Scarlet 
>> Spiders] [.marvel-2014.07.31][0] sending failed shard for 
>> [.marvel-2014.07.31][0], node[TsoETYSERg-DNDpiDxpxKA], [R], 
>> s[INITIALIZING], indexUUID [hsU3ZVo

Re: Weird Exception.

2014-08-11 Thread John Smith
Oops switched linux boxes and forgot to set the file limit. Let me see if 
that works :)

On Monday, 11 August 2014 09:33:54 UTC-4, John Smith wrote:
>
> And I see this also...
>
> [2014-08-11 09:31:35,233][WARN ][cluster.action.shard ] [Scarlet 
> Spiders] [.marvel-2014.08.04][0] sending failed shard for 
> [.marvel-2014.08.04][0], node[TsoETYSERg-DNDpiDxpxKA], [R], 
> s[INITIALIZING], indexUUID [gxfk0pCiQg2QCJRyUWAYTw], reason [engine 
> failure, message [corrupted preexisting 
> index][FileSystemException[/home/elasticsearch/elasticsearch-1.3.0/data/esdashboard/nodes/0/indices/.marvel-2014.08.04/0/index/_
> 2bf.si: Too many open files]]]
> [2014-08-11 09:31:35,393][WARN ][index.engine.internal] [Scarlet 
> Spiders] [.marvel-2014.07.31][0] failed engine [corrupted preexisting index]
> [2014-08-11 09:31:35,394][WARN ][indices.cluster  ] [Scarlet 
> Spiders] [.marvel-2014.07.31][0] failed to start shard
> java.nio.file.FileSystemException: 
> /home/elasticsearch/elasticsearch-1.3.0/data/esdashboard/nodes/0/indices/.marvel-2014.07.31/0/index/_
> 3ns.si: Too many open files
> at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
> at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> at 
> sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
> at java.nio.channels.FileChannel.open(FileChannel.java:287)
> at java.nio.channels.FileChannel.open(FileChannel.java:335)
> at 
> org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:81)
> at 
> org.apache.lucene.store.FileSwitchDirectory.openInput(FileSwitchDirectory.java:172)
> at 
> org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:80)
> at 
> org.elasticsearch.index.store.DistributorDirectory.openInput(DistributorDirectory.java:130)
> at 
> org.apache.lucene.store.Directory.openChecksumInput(Directory.java:113)
> at 
> org.apache.lucene.codecs.lucene46.Lucene46SegmentInfoReader.read(Lucene46SegmentInfoReader.java:49)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:361)
> at 
> org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457)
> at 
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:907)
> at 
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:753)
> at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453)
> at 
> org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:96)
> at 
> org.elasticsearch.index.store.Store.readLastCommittedSegmentsInfo(Store.java:124)
> at org.elasticsearch.index.store.Store.access$300(Store.java:74)
> at 
> org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:442)
> at 
> org.elasticsearch.index.store.Store$MetadataSnapshot.(Store.java:433)
> at org.elasticsearch.index.store.Store.getMetadata(Store.java:144)
> at 
> org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:724)
> at 
> org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:576)
> at 
> org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:183)
> at 
> org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:444)
> at 
> org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> [2014-08-11 09:31:35,397][WARN ][cluster.action.shard ] [Scarlet 
> Spiders] [.marvel-2014.07.31][0] sending failed shard for 
> [.marvel-2014.07.31][0], node[TsoETYSERg-DNDpiDxpxKA], [R], 
> s[INITIALIZING], indexUUID [hsU3ZVo3T0OlYreN3mZ5aQ], reason [Failed to 
> start shard, message 
> [FileSystemException[/home/elasticsearch/elasticsearch-1.3.0/data/esdashboard/nodes/0/indices/.marvel-2014.07.31/0/index/_
> 3ns.si: Too many open files]]]
> [2014-08-11 09:31:35,399][WARN ][cluster.action.shard ] [Scarlet 
> Spiders] [.marvel-2014.07.31][0] sending failed shard for 
> [.marvel-2014.07.31][0], node[TsoETYSERg-DNDpiDxpxKA], [R], 
> s[INITIALIZI

Re: Weird Exception.

2014-08-11 Thread John Smith
And I see this also...

[2014-08-11 09:31:35,233][WARN ][cluster.action.shard ] [Scarlet 
Spiders] [.marvel-2014.08.04][0] sending failed shard for 
[.marvel-2014.08.04][0], node[TsoETYSERg-DNDpiDxpxKA], [R], 
s[INITIALIZING], indexUUID [gxfk0pCiQg2QCJRyUWAYTw], reason [engine 
failure, message [corrupted preexisting 
index][FileSystemException[/home/elasticsearch/elasticsearch-1.3.0/data/esdashboard/nodes/0/indices/.marvel-2014.08.04/0/index/_2bf.si:
 
Too many open files]]]
[2014-08-11 09:31:35,393][WARN ][index.engine.internal] [Scarlet 
Spiders] [.marvel-2014.07.31][0] failed engine [corrupted preexisting index]
[2014-08-11 09:31:35,394][WARN ][indices.cluster  ] [Scarlet 
Spiders] [.marvel-2014.07.31][0] failed to start shard
java.nio.file.FileSystemException: 
/home/elasticsearch/elasticsearch-1.3.0/data/esdashboard/nodes/0/indices/.marvel-2014.07.31/0/index/_3ns.si:
 
Too many open files
at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at 
sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
at java.nio.channels.FileChannel.open(FileChannel.java:287)
at java.nio.channels.FileChannel.open(FileChannel.java:335)
at 
org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:81)
at 
org.apache.lucene.store.FileSwitchDirectory.openInput(FileSwitchDirectory.java:172)
at 
org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:80)
at 
org.elasticsearch.index.store.DistributorDirectory.openInput(DistributorDirectory.java:130)
at 
org.apache.lucene.store.Directory.openChecksumInput(Directory.java:113)
at 
org.apache.lucene.codecs.lucene46.Lucene46SegmentInfoReader.read(Lucene46SegmentInfoReader.java:49)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:361)
at 
org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:457)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:907)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:753)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:453)
at 
org.elasticsearch.common.lucene.Lucene.readSegmentInfos(Lucene.java:96)
at 
org.elasticsearch.index.store.Store.readLastCommittedSegmentsInfo(Store.java:124)
at org.elasticsearch.index.store.Store.access$300(Store.java:74)
at 
org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:442)
at 
org.elasticsearch.index.store.Store$MetadataSnapshot.(Store.java:433)
at org.elasticsearch.index.store.Store.getMetadata(Store.java:144)
at 
org.elasticsearch.indices.cluster.IndicesClusterStateService.applyInitializingShard(IndicesClusterStateService.java:724)
at 
org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewOrUpdatedShards(IndicesClusterStateService.java:576)
at 
org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:183)
at 
org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:444)
at 
org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:153)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[2014-08-11 09:31:35,397][WARN ][cluster.action.shard ] [Scarlet 
Spiders] [.marvel-2014.07.31][0] sending failed shard for 
[.marvel-2014.07.31][0], node[TsoETYSERg-DNDpiDxpxKA], [R], 
s[INITIALIZING], indexUUID [hsU3ZVo3T0OlYreN3mZ5aQ], reason [Failed to 
start shard, message 
[FileSystemException[/home/elasticsearch/elasticsearch-1.3.0/data/esdashboard/nodes/0/indices/.marvel-2014.07.31/0/index/_3ns.si:
 
Too many open files]]]
[2014-08-11 09:31:35,399][WARN ][cluster.action.shard ] [Scarlet 
Spiders] [.marvel-2014.07.31][0] sending failed shard for 
[.marvel-2014.07.31][0], node[TsoETYSERg-DNDpiDxpxKA], [R], 
s[INITIALIZING], indexUUID [hsU3ZVo3T0OlYreN3mZ5aQ], reason [engine 
failure, message [corrupted preexisting 
index][FileSystemException[/home/elasticsearch/elasticsearch-1.3.0/data/esdashboard/nodes/0/indices/.marvel-2014.07.31/0/index/_3ns.si:
 
Too many open files]]]



On Monday, 11 August 2014 09:31:50 UTC-4, John Smith wrote:
>
> Hi using ElasticSearch 1.3.0 JAVA 1.8_5 configed for 32g
>
> [2014-08-11 09:23:16,258][WARN ][cluster.action.shard ] [Tyrant] 
> [.marvel-2014.08.04][0] received shard failed for [.marvel-2014.08.04][0], 
&g

Weird Exception.

2014-08-11 Thread John Smith
Hi using ElasticSearch 1.3.0 JAVA 1.8_5 configed for 32g

[2014-08-11 09:23:16,258][WARN ][cluster.action.shard ] [Tyrant] 
[.marvel-2014.08.04][0] received shard failed for [.marvel-2014.08.04][0], 
node[TsoETYSERg-DNDpiDxpxKA], [R], s[INITIALIZING], indexUUID 
[gxfk0pCiQg2QCJRyUWAYTw], reason [Failed to create shard, message 
[IndexShardCreationException[[.marvel-2014.08.04][0] failed to create 
shard]; nested: IOException[directory 
'/.../.../elasticsearch-1.3.0/data/esdashboard/nodes/0/indices/.marvel-2014.08.04/0/index'
 
exists and is a directory, but cannot be listed: list() returned null]; ]]

Also 1 of the 32 cores is at 100% CPU

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/13f56aca-def1-4dcd-b4d5-e734cfc6c8fd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Just a bit of community fun! Post your node names!

2014-08-07 Thread John Smith
Son of Satan
Squidboy


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b2ab89ab-f32e-49c7-9342-280526612943%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Calculating the avg time between two time stamps

2014-07-24 Thread John Smith
Cool! So for the new feature...

All I need is an empty field called "avgRespTime" and in my mapping define a 
transform for this field which compute the time difference between thw two 
stamp times?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/48baa73b-38a5-41dd-a192-4f0d6b470c50%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Calculating the avg time between two time stamps

2014-07-24 Thread John Smith
Using ES 1.2.2


The below aggregation will give me my average requests per second and the 
average response times for each second.
Is this the only way to do it or is there better way? Since scripting is a 
bit slow?

"aggs": {
"tps": {
  "date_histogram": {
"field": "stampStart",
"interval": "second",
"order": {
  "_key": "asc"
}
  },
  "aggs": {
"avg_resp": {
  "avg": {
"script": "doc.stampEnd.value - doc.stampStart.value"
  }
}
  }
}
  }

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a618c94b-f79a-494a-9d12-0327cb9eff5c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Does insert order matter for date range queries

2014-07-17 Thread John Smith
Thanks

On Tuesday, 15 July 2014 11:49:56 UTC-4, Nikolas Everett wrote:
>
> I don't believe it matters, no.
>
>
> On Tue, Jul 15, 2014 at 11:47 AM, John Smith  > wrote:
>
>> Say I insert a few documents that have my own "date" field (NOT the ES 
>> insert stamp) but not inserted in order of that specific date field.
>>
>> {
>> ...
>>  "DateMoved": "2014-12-31..."
>> }
>>
>> {
>> ...
>>  "DateMoved": "2013-12-31..."
>> }
>>
>> {
>> ...
>>  "DateMoved": "2014-12-25..."
>> }
>>
>> {
>> ...
>>  "DateMoved": "2012-12-25..."
>> }
>>
>> {
>> ...
>>  "DateMoved": "2013-12-25..."
>> }
>>
>>
>>
>> And so on...
>>
>> If i wanted to do a range query by DateMoved (For all documents in 
>> 2013-12) would it affect the speed of the query?
>>
>> I have been testing my query and seems to be running ok. But just double 
>> checking to see there's no caveats.
>>  
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/cca69f29-b674-4505-9837-712e86ed59d2%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/cca69f29-b674-4505-9837-712e86ed59d2%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2c764769-e24e-4134-8c2f-d88333024724%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Does insert order matter for date range queries

2014-07-15 Thread John Smith
Say I insert a few documents that have my own "date" field (NOT the ES 
insert stamp) but not inserted in order of that specific date field.

{
...
 "DateMoved": "2014-12-31..."
}

{
...
 "DateMoved": "2013-12-31..."
}

{
...
 "DateMoved": "2014-12-25..."
}

{
...
 "DateMoved": "2012-12-25..."
}

{
...
 "DateMoved": "2013-12-25..."
}



And so on...

If i wanted to do a range query by DateMoved (For all documents in 2013-12) 
would it affect the speed of the query?

I have been testing my query and seems to be running ok. But just double 
checking to see there's no caveats.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cca69f29-b674-4505-9837-712e86ed59d2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Does order matter when doing range searches by date?

2014-07-15 Thread John Smith
Say I insert a few documents that have "date" field (not ES stamp) but the 
documents are not inserted in order of date.

{
...
 "DateMoved": "2014-12-31..."
}

{
...
 "DateMoved": "2013-12-31..."
}

{
...
 "DateMoved": "2014-12-25..."
}

{
...
 "DateMoved": "2012-12-25..."
}

{
...
 "DateMoved": "2013-12-25..."
}



And so on...

If i wanted to do a range query by DateMoved (For all documents in 2013-12) 
would it affect the speed of the query?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c000ec08-abc9-44c1-9f88-544ce69a824c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: What types of SSDs?

2014-07-14 Thread John Smith
So I got my server with SAS

It's an HP DL 380P G7
2 x6 (Hyperthreaded) 24 cores 72GB RAM and 5 Intel 530 SSDs (RAID 10)

These are the stats while JMeter is pushing 3,500 indexing operations/sec 
Average documents size is 2,500 bytes.

Indexing - Index:1.98msIndexing - Delete:0msSearch - Query:9.81msSearch - 
Fetch:0.62msGet - Total:0msGet - Exists:0msGet - Missing:0msRefresh:215.91ms
Flush:532.62ms



On Friday, 11 July 2014 19:29:17 UTC-4, Mark Walkom wrote:
>
> You could setup a hot and cold based allocation system, put your highly 
> accessed (hot) indexes on the SSDs and then the rest on the spinning disk.
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com 
> web: www.campaignmonitor.com
>  
>
> On 11 July 2014 23:35, John Smith > 
> wrote:
>
>> Right now I have 4 boxes...
>>
>> 2x 32 cores 200GB RAM with RAID10 SATA1 + the Fusion IO
>>
>> 2x 24 cores 96GB RAM with RAID10 SAS but regular mechanical drives.
>>
>> I only test them as pairs. So it's clusters of 2
>>
>> On the surface all searches seem to perform quite close to each other. 
>> Only when looking at the stats in HQ and Marvel the true story is told. For 
>> instance most warnings with Fusion IO are yellow at best. While with the 
>> SAS Raid 10 (Regular SATA Drives) they reach red.
>>
>> I'm hopping I can get some regular SSDs to put on the SAS boxes and see 
>> if it's better.
>>
>>
>>
>>
>> On Thursday, 10 July 2014 18:00:11 UTC-4, Jörg Prante wrote:
>>>
>>>  Did you consider SSD with RAID0 (Linux, ext4, noatime) and SAS2 (6g/s) 
>>> or SAS3 (12g/s) controller?
>>>
>>> I have for personal use at home LSI SAS 2008 of 4x128g SSD RAID0 with 
>>> sustained 800 MB/s write and 950 MB/s read, on a commodity dual AMD C32 
>>> socket server mainboard. I do not test with JMeter but on this single node 
>>> hardware alone I observe 15k bulk index operations per second, and 
>>> scan/scroll over 45m docs takes less than 70 min.
>>>
>>> I'm waiting until SAS3 is affordable for me. For the future I have on my 
>>> list: LSI SAS 3008 HBA and SAS3 SSDs. For personal home use, Fusion IO is 
>>> too heavy for my wallet. Even for commercial purpose I do not consider it 
>>> as a cost effective solution.
>>>
>>> Just a note: if you want spend your money to accelerate ES, buy RAM. You 
>>> will get more performance than from drives. Reason is the lower latency. 
>>> Low latency will speed up applications like ES more than the fastest I/O 
>>> drive is able to. That reminds me that I'm waiting since ages for DDR4 
>>> RAM...
>>>
>>> Jörg
>>>
>>>
>>> On Thu, Jul 10, 2014 at 10:13 PM, John Smith  
>>> wrote:
>>>
>>>> Using 1.2.1
>>>>
>>>> I know each system and functionality is different but just curious when 
>>>> people say buy SSDs for ES, what types of SSDs are they buying?
>>>>
>>>> Fortunately for me I had some Fusion IO cards to test with, but just 
>>>> wondering if it's worth the price and if I should look into off the shelf 
>>>> SSDs like Samsung EVOs using SAS instead of pure SATA.
>>>>
>>>> So far from my testing it seems that all search operation regardless of 
>>>> the drive type seem to return in the same amount of time. So I suppose 
>>>> caching is playing a huge part here.
>>>>
>>>>  Though when looking at the HQ indexing stats like query time, fetch 
>>>> time, refresh time etc... The Fusion IO fares a bit better then regular 
>>>> SSDs using SATA.
>>>>
>>>> For instance refresh time for Fusion IO is 250ms while for regular SSDs 
>>>> (SATA NOT SAS, will test SAS when I get a chance) it's just above 1 second.
>>>> Even with fusion IO I do see some warnings on the index stats, but 
>>>> slightly better then regular SSDs
>>>>
>>>> Some strategies I picked for my indexes...
>>>> - New index per day, plus routing by "user"
>>>> - New index per day for monster users.
>>>>
>>>> Using JMeter to test...
>>>> - Achieved 3,500 index operations per second (Not bulk) avg document 
>>>> size 2,500 bytes (Fusion IO seemed to perform a bit better)
>>>> - Created a total of 25 indexes totaling over 100,000,000 documents 
>>>> anywhere between 3,000,000 to 5,000,000 docum

Re: What types of SSDs?

2014-07-11 Thread John Smith
Right now I have 4 boxes...

2x 32 cores 200GB RAM with RAID10 SATA1 + the Fusion IO

2x 24 cores 96GB RAM with RAID10 SAS but regular mechanical drives.

I only test them as pairs. So it's clusters of 2

On the surface all searches seem to perform quite close to each other. Only 
when looking at the stats in HQ and Marvel the true story is told. For 
instance most warnings with Fusion IO are yellow at best. While with the 
SAS Raid 10 (Regular SATA Drives) they reach red.

I'm hopping I can get some regular SSDs to put on the SAS boxes and see if 
it's better.




On Thursday, 10 July 2014 18:00:11 UTC-4, Jörg Prante wrote:
>
> Did you consider SSD with RAID0 (Linux, ext4, noatime) and SAS2 (6g/s) or 
> SAS3 (12g/s) controller?
>
> I have for personal use at home LSI SAS 2008 of 4x128g SSD RAID0 with 
> sustained 800 MB/s write and 950 MB/s read, on a commodity dual AMD C32 
> socket server mainboard. I do not test with JMeter but on this single node 
> hardware alone I observe 15k bulk index operations per second, and 
> scan/scroll over 45m docs takes less than 70 min.
>
> I'm waiting until SAS3 is affordable for me. For the future I have on my 
> list: LSI SAS 3008 HBA and SAS3 SSDs. For personal home use, Fusion IO is 
> too heavy for my wallet. Even for commercial purpose I do not consider it 
> as a cost effective solution.
>
> Just a note: if you want spend your money to accelerate ES, buy RAM. You 
> will get more performance than from drives. Reason is the lower latency. 
> Low latency will speed up applications like ES more than the fastest I/O 
> drive is able to. That reminds me that I'm waiting since ages for DDR4 
> RAM...
>
> Jörg
>
>
> On Thu, Jul 10, 2014 at 10:13 PM, John Smith  > wrote:
>
>> Using 1.2.1
>>
>> I know each system and functionality is different but just curious when 
>> people say buy SSDs for ES, what types of SSDs are they buying?
>>
>> Fortunately for me I had some Fusion IO cards to test with, but just 
>> wondering if it's worth the price and if I should look into off the shelf 
>> SSDs like Samsung EVOs using SAS instead of pure SATA.
>>
>> So far from my testing it seems that all search operation regardless of 
>> the drive type seem to return in the same amount of time. So I suppose 
>> caching is playing a huge part here.
>>
>> Though when looking at the HQ indexing stats like query time, fetch time, 
>> refresh time etc... The Fusion IO fares a bit better then regular SSDs 
>> using SATA.
>>
>> For instance refresh time for Fusion IO is 250ms while for regular SSDs 
>> (SATA NOT SAS, will test SAS when I get a chance) it's just above 1 second.
>> Even with fusion IO I do see some warnings on the index stats, but 
>> slightly better then regular SSDs
>>
>> Some strategies I picked for my indexes...
>> - New index per day, plus routing by "user"
>> - New index per day for monster users.
>>
>> Using JMeter to test...
>> - Achieved 3,500 index operations per second (Not bulk) avg document size 
>> 2,500 bytes (Fusion IO seemed to perform a bit better)
>> - Created a total of 25 indexes totaling over 100,000,000 documents 
>> anywhere between 3,000,000 to 5,000,000 documents per index.
>> - Scroll query to retrieve 15,000,000 documents out of the 100,000,000 
>> (all indexes) took 25 minutes regardless of drive type.
>>
>> P.s: I want to index 2,000,000,000 documents per year so about 4,000,000 
>> per day. So you can see why Fusion IO could be expensive :)
>>
>> Thanks
>>
>>
>>
>>
>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/24928d08-6354-4661-8164-9ff665709285%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/24928d08-6354-4661-8164-9ff665709285%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/13e20470-a38e-4d89-be98-5d6e26b0f0aa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


What types of SSDs?

2014-07-10 Thread John Smith
Using 1.2.1

I know each system and functionality is different but just curious when 
people say buy SSDs for ES, what types of SSDs are they buying?

Fortunately for me I had some Fusion IO cards to test with, but just 
wondering if it's worth the price and if I should look into off the shelf 
SSDs like Samsung EVOs using SAS instead of pure SATA.

So far from my testing it seems that all search operation regardless of the 
drive type seem to return in the same amount of time. So I suppose caching 
is playing a huge part here.

Though when looking at the HQ indexing stats like query time, fetch time, 
refresh time etc... The Fusion IO fares a bit better then regular SSDs 
using SATA.

For instance refresh time for Fusion IO is 250ms while for regular SSDs 
(SATA NOT SAS, will test SAS when I get a chance) it's just above 1 second.
Even with fusion IO I do see some warnings on the index stats, but slightly 
better then regular SSDs

Some strategies I picked for my indexes...
- New index per day, plus routing by "user"
- New index per day for monster users.

Using JMeter to test...
- Achieved 3,500 index operations per second (Not bulk) avg document size 
2,500 bytes (Fusion IO seemed to perform a bit better)
- Created a total of 25 indexes totaling over 100,000,000 documents 
anywhere between 3,000,000 to 5,000,000 documents per index.
- Scroll query to retrieve 15,000,000 documents out of the 100,000,000 (all 
indexes) took 25 minutes regardless of drive type.

P.s: I want to index 2,000,000,000 documents per year so about 4,000,000 
per day. So you can see why Fusion IO could be expensive :)

Thanks






-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/24928d08-6354-4661-8164-9ff665709285%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Jespen article reaction

2014-06-24 Thread John Smith
I was wondering what reaction the community had to this article:

http://aphyr.com/posts/317-call-me-maybe-elasticsearch

I would be interested in a response from knowledgeable users/developers.

After reading this article:

I wanted to know 

1)If the issues brought up in this article are valid and how likely you are 
to encounter them in production?
2) If they are valid, how can you minimize them with the current code bases?
3) What is being done in the short/medium/long term to address these issue 
? Are there any particular issues we can follow to track progress.

TIA

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/781a651e-16db-4bfd-a4ab-f11fe4b97bd4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Marvel 1.2.0 java.lang.IllegalStateException

2014-06-13 Thread John Smith
Ok works thanks

On Friday, 13 June 2014 10:02:06 UTC-4, Paweł Krzaczkowski wrote:
>
> Hi,
>
> Yes it's been released as Marvel 1.2.1
>
>
> 2014-06-13 16:01 GMT+02:00 John Smith >:
>
>> Is the released? Or it's still in github?
>>
>> Experiencing the same thing...
>>
>> Ran the commands from above...
>>
>> http://pastebin.com/WUTTLgsS
>>
>>
>> On Monday, 9 June 2014 14:44:17 UTC-4, Paweł Krzaczkowski wrote:
>>
>>> It works .. thx for a quick fix
>>>
>>>
>>> 2014-06-09 17:48 GMT+02:00 Paweł Krzaczkowski :
>>>
>>>> Im out of office for today so ill test it tomorrow morning and let You 
>>>> know if it works
>>>>
>>>> pawel (at) mobile
>>>>
>>>> On 9 cze 2014, at 17:40, Boaz Leskes  wrote:
>>>>
>>>> Hi Pawel,
>>>>
>>>> We just did a quick minor release to marvel with a fix for this. Would 
>>>> be great if you can give it a try and confirm how it goes.
>>>>
>>>> Cheers,
>>>> Boaz
>>>>
>>>> On Friday, June 6, 2014 12:01:52 PM UTC+2, Boaz Leskes wrote:
>>>>>
>>>>> Thx Pawel,
>>>>>
>>>>> Note huge but larger then limit. Working on a fix.
>>>>>
>>>>>
>>>>> On Friday, June 6, 2014 10:10:45 AM UTC+2, Paweł Krzaczkowski wrote:
>>>>>>
>>>>>> This one is without metadata
>>>>>>
>>>>>> http://pastebin.com/tmJGA5Kq
>>>>>> http://xxx:9200/_cluster/state/version,master_node,nodes,
>>>>>> routing_table,blocks/?human&pretty
>>>>>>
>>>>>> Pawel
>>>>>>
>>>>>> W dniu piątek, 6 czerwca 2014 09:28:30 UTC+2 użytkownik Boaz Leskes 
>>>>>> napisał:
>>>>>>>
>>>>>>>  HI Pawel,
>>>>>>>
>>>>>>> I see - your cluster state (nodes + routing only, not meta data), 
>>>>>>> seems to be larger then 16KB when rendered to SMILE, which is quite big 
>>>>>>> - 
>>>>>>> does this make sense?
>>>>>>>
>>>>>>> Above 16KB an underlying paging system introduced in the ES 1.x 
>>>>>>> branch kicks in. At that breaks something in Marvel than normally ships 
>>>>>>> very small documents.
>>>>>>>
>>>>>>> I'll work on a fix. Can you confirm your cluster state (again, 
>>>>>>> without the metadata) is indeed very large?
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Boaz
>>>>>>>
>>>>>>> On Thursday, June 5, 2014 10:56:00 AM UTC+2, Paweł Krzaczkowski 
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi.
>>>>>>>>
>>>>>>>> After upgrading Marvel to 1.2.0 (running on Elasticsearch 1.2.1) 
>>>>>>>> i'm getting errors like
>>>>>>>>
>>>>>>>> [2014-06-05 10:47:25,346][INFO ][node ] 
>>>>>>>> [es-m-3] version[1.2.1], pid[68924], build[6c95b75/2014-06-03T15:02
>>>>>>>> :52Z]
>>>>>>>> [2014-06-05 10:47:25,347][INFO ][node ] 
>>>>>>>> [es-m-3] initializing ...
>>>>>>>> [2014-06-05 10:47:25,367][INFO ][plugins  ] 
>>>>>>>> [es-m-3] loaded [marvel, analysis-icu], sites [marvel, head, 
>>>>>>>> segmentspy, 
>>>>>>>> browser, paramedic]
>>>>>>>> [2014-06-05 10:47:28,455][INFO ][node ] 
>>>>>>>> [es-m-3] initialized
>>>>>>>> [2014-06-05 10:47:28,456][INFO ][node ] 
>>>>>>>> [es-m-3] starting ...
>>>>>>>> [2014-06-05 10:47:28,597][INFO ][transport] 
>>>>>>>> [es-m-3] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address 
>>>>>>>> {inet[/192.168.0.212:9300]}
>>>>>>>> [2014-06-05 10:47:42,340][INFO ][cluster.service  ] 
>>>>>>>> [es-m-3] new_master [es-m-3][0H3grrJxTJunU1U6FmkIEg][es-m-3][inet[
>>>>>>>> 192.168.0.212/192.168.0.212:93

Re: Marvel 1.2.0 java.lang.IllegalStateException

2014-06-13 Thread John Smith
Is the released? Or it's still in github?

Experiencing the same thing...

Ran the commands from above...

http://pastebin.com/WUTTLgsS


On Monday, 9 June 2014 14:44:17 UTC-4, Paweł Krzaczkowski wrote:
>
> It works .. thx for a quick fix
>
>
> 2014-06-09 17:48 GMT+02:00 Paweł Krzaczkowski  >:
>
>> Im out of office for today so ill test it tomorrow morning and let You 
>> know if it works
>>
>> pawel (at) mobile
>>
>> On 9 cze 2014, at 17:40, Boaz Leskes > 
>> wrote:
>>
>> Hi Pawel,
>>
>> We just did a quick minor release to marvel with a fix for this. Would be 
>> great if you can give it a try and confirm how it goes.
>>
>> Cheers,
>> Boaz
>>
>> On Friday, June 6, 2014 12:01:52 PM UTC+2, Boaz Leskes wrote:
>>>
>>> Thx Pawel,
>>>
>>> Note huge but larger then limit. Working on a fix.
>>>
>>>
>>> On Friday, June 6, 2014 10:10:45 AM UTC+2, Paweł Krzaczkowski wrote:

 This one is without metadata

 http://pastebin.com/tmJGA5Kq
 http://xxx:9200/_cluster/state/version,master_node,
 nodes,routing_table,blocks/?human&pretty

 Pawel

 W dniu piątek, 6 czerwca 2014 09:28:30 UTC+2 użytkownik Boaz Leskes 
 napisał:
>
> HI Pawel,
>
> I see - your cluster state (nodes + routing only, not meta data), 
> seems to be larger then 16KB when rendered to SMILE, which is quite big - 
> does this make sense?
>
> Above 16KB an underlying paging system introduced in the ES 1.x branch 
> kicks in. At that breaks something in Marvel than normally ships very 
> small 
> documents.
>
> I'll work on a fix. Can you confirm your cluster state (again, without 
> the metadata) is indeed very large?
>
> Cheers,
> Boaz
>
> On Thursday, June 5, 2014 10:56:00 AM UTC+2, Paweł Krzaczkowski wrote:
>>
>> Hi.
>>
>> After upgrading Marvel to 1.2.0 (running on Elasticsearch 1.2.1) i'm 
>> getting errors like
>>
>> [2014-06-05 10:47:25,346][INFO ][node ] [es-m-3] 
>> version[1.2.1], pid[68924], build[6c95b75/2014-06-03T15:02:52Z]
>> [2014-06-05 10:47:25,347][INFO ][node ] [es-m-3] 
>> initializing ...
>> [2014-06-05 10:47:25,367][INFO ][plugins  ] [es-m-3] 
>> loaded [marvel, analysis-icu], sites [marvel, head, segmentspy, browser, 
>> paramedic]
>> [2014-06-05 10:47:28,455][INFO ][node ] [es-m-3] 
>> initialized
>> [2014-06-05 10:47:28,456][INFO ][node ] [es-m-3] 
>> starting ...
>> [2014-06-05 10:47:28,597][INFO ][transport] [es-m-3] 
>> bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/
>> 192.168.0.212:9300]}
>> [2014-06-05 10:47:42,340][INFO ][cluster.service  ] [es-m-3] 
>> new_master [es-m-3][0H3grrJxTJunU1U6FmkIEg][es-m-3][inet[
>> 192.168.0.212/192.168.0.212:9300]]{data=false 
>> , 
>> master=true}, reason: zen-disco-join (elected_as_master)
>> [2014-06-05 10:47:42,350][INFO ][discovery] [es-m-3] 
>> freshmind/0H3grrJxTJunU1U6FmkIEg
>> [2014-06-05 10:47:42,365][INFO ][http ] [es-m-3] 
>> bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/
>> 192.168.0.212:9200]}
>> [2014-06-05 10:47:42,368][INFO ][node ] [es-m-3] 
>> started
>> [2014-06-05 10:47:44,098][INFO ][cluster.service  ] [es-m-3] 
>> added 
>> {[es-m-1][MHl5Ls-cRXCwc7OC-P0J5w][es-m-1][inet[/192.168.0.210:9300]]{data=false,
>>  
>> machine=44454c4c-5300-1052-8038-b9c04f5a5a31, master=true},}, 
>> reason: zen-disco-receive(join from node[[es-m-1][MHl5Ls-
>> cRXCwc7OC-P0J5w][es-m-1][inet[/192.168.0.210:9300]]{data=false, 
>> machine=44454c4c-5300-1052-8038-b9c04f5a5a31, master=true}])
>> [2014-06-05 10:47:44,401][INFO ][gateway  ] [es-m-3] 
>> recovered [28] indices into cluster_state
>> [2014-06-05 10:47:48,683][ERROR][marvel.agent ] [es-m-3] 
>> exporter [es_exporter] has thrown an exception:
>> java.lang.IllegalStateException: array not available
>> at org.elasticsearch.common.bytes.PagedBytesReference.
>> array(PagedBytesReference.java:289)
>> at org.elasticsearch.marvel.agent.exporter.ESExporter.
>> addXContentRendererToConnection(ESExporter.java:209)
>> at org.elasticsearch.marvel.agent.exporter.ESExporter.
>> exportXContent(ESExporter.java:252)
>> at org.elasticsearch.marvel.agent.exporter.ESExporter.
>> exportEvents(ESExporter.java:161)
>> at org.elasticsearch.marvel.agent.AgentService$
>> ExportingWorker.exportEvents(AgentService.java:305)
>> at org.elasticsearch.marvel.agent.AgentService$
>> ExportingWorker.run(AgentService.java:240)
>> at java.lang.Thread.run(Thread.java:74

Re: Understanding merge statistics from Marvel

2014-06-08 Thread John Smith
I know benchmarking is a tough subject! But what do those number mean?

On Friday, 6 June 2014 12:17:22 UTC-4, John Smith wrote:
>
> Running Elasticsearch 1.2.1 with Java 1.7_55 on CentOs 6.5
>
> The machine is a 32 core 96GB with standard spinning disk, but I also 
> installed 1 Samsung Evo 840 for testing ES.
> The Evo is rated at 500MB/s though the Linux perf test reported about 
> 300MB/s read and about 250MB/s write. The board is SataII which explains 
> why it's 300MB/s max.
>
> Using Jmeter to send index requests to ES
>
> Executing about 6200 puts/s
>
> Marvel reports 
> 2200 IOPS/
> 20MB merges/s
>
> And iostat for the drive
>
> sdf   0.00 14214.000.00 2021.33 0.0062.3563.17 
>10.495.17   0.48  97.27
>
> Also seeing  on the console: stop throttling indexing: 
> numMergesInFlight=4, maxNumMerges=5
>
> Are these numbers good?
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c0106a80-2520-4ea2-8b7b-f587a87cd610%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Understanding merge statistics from Marvel

2014-06-06 Thread John Smith
Running Elasticsearch 1.2.1 with Java 1.7_55 on CentOs 6.5

The machine is a 32 core 96GB with standard spinning disk, but I also 
installed 1 Samsung Evo 840 for testing ES.
The Evo is rated at 500MB/s though the Linux perf test reported about 
300MB/s read and about 250MB/s write. The board is SataII which explains 
why it's 300MB/s max.

Using Jmeter to send index requests to ES

Executing about 6200 puts/s

Marvel reports 
2200 IOPS/
20MB merges/s

And iostat for the drive

sdf   0.00 14214.000.00 2021.33 0.0062.3563.17 
   10.495.17   0.48  97.27

Also seeing  on the console: stop throttling indexing: numMergesInFlight=4, 
maxNumMerges=5

Are these numbers good?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/52970ffa-8cab-4f67-8d97-f4358062485d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Trying to undand MArvel stats and merge with SSD

2014-06-06 Thread John Smith
Running Elasticsearch 1.2.1 with Java 1.7_55 on CentOs 6.5

The machine is a 32 core 96GB with standard spinning disk, but I also 
installed 1 Samsung Evo 840 for testing ES.
The Evo is rated at 500MB/s though the Linux perf test reported about 
300MB/s read and about 250MB/s write. The board is SataII which explains 
why it's 300MB/s max.

Using Jmeter to send index requests to ES

Executing about 6200 puts/s

Marvel reports 
2200 IOPS/
20MB merges/s

And iostat for the drive

sdf   0.00 14214.000.00 2021.33 0.0062.3563.17 
   10.495.17   0.48  97.27

Also seeing  on the console: stop throttling indexing: numMergesInFlight=4, 
maxNumMerges=5

Are these numbers good?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9c5f418d-99fe-48c4-b24e-4de18b9793d8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Stats aggregation of a date_histogram

2014-05-27 Thread John Smith
Thanks is this something planned?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b7a8b7d7-a7d0-4943-bacf-5014e4e7183f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Stats aggregation of a date_histogram

2014-05-26 Thread John Smith
Using ES 1.2

Is there a way to aggregate an aggregation?

So say I have a query for "views" per seconds for the last 5 minutes...

POST /xxx/xxx/_search
{
  "size": 0,
  "aggs": {
"last_5_mins": {
  "filter": {
"range": {
  "viewed": {
"gte": "now-5m",
"lte": "now"
  }
}
  },
  "aggs": {
"views_per_second": {
  "date_histogram": {
"field": "viewed",
"interval": "second"
  }
}
  }
}
  }
}

I would like to get my a stats aggregation of the date_histogram.

So I would like to know what was my highest peek of views and my avg views 
per second.
Is it possible or I have to do those calculations myself based on the 
aggregation returned?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d63a3edf-f9c6-4e40-9870-7b61e0e6a831%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ES indexing exactly half the documents.

2014-05-26 Thread John Smith
Never mind complete clown here. That's me! :p

I was looking at the Jmeter summary report where my test script has 
multiple samplers (2 to be exact) and I was looking at the total and not 
how many HTTP requests I sent. The numbers match lol! Never mind!

As you were!

On Monday, 26 May 2014 11:40:38 UTC-4, John Smith wrote:
>
> Used PUT not POST. Also seems to be doing the same thing with the Java 
> client as embedded node.
>
> No matter how many requests i send only half the docs get indexed.
>
> On Monday, 26 May 2014 11:35:04 UTC-4, John Smith wrote:
>>
>> Using ES 1.1.1
>>
>> I have 4 node cluster
>>
>> 1 primary 3 workers
>>
>> Starting each node with Elasticsearc.bat (Uisng windows 2003, JAVA 1.7_45 
>> and 16GB heap)
>>
>> Using the index api to create the index automatically on the first POST 
>> operation
>>
>> POST http://10.0.0.xxx:9200/xxx/abc/
>>
>> POST data:
>> {
>> "account" : 12345678,
>> "started": "2014-05-26T15:30:14.910",
>> "ended": "2014-05-26T15:30:17.697"
>> }
>>
>>
>> Sending direct http using JMeter. I sent 5000 requests give or take, but 
>> Marvel and Head show that only 2500 documents got indexed???
>>
>> I'm I missing something is it a config thing?
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9291ad06-78f0-48cc-9cc9-5295a81e8afb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: ES indexing exactly half the documents.

2014-05-26 Thread John Smith
Used PUT not POST. Also seems to be doing the same thing with the Java 
client as embedded node.

No matter how many requests i send only half the docs get indexed.

On Monday, 26 May 2014 11:35:04 UTC-4, John Smith wrote:
>
> Using ES 1.1.1
>
> I have 4 node cluster
>
> 1 primary 3 workers
>
> Starting each node with Elasticsearc.bat (Uisng windows 2003, JAVA 1.7_45 
> and 16GB heap)
>
> Using the index api to create the index automatically on the first POST 
> operation
>
> POST http://10.0.0.xxx:9200/xxx/abc/
>
> POST data:
> {
> "account" : 12345678,
> "started": "2014-05-26T15:30:14.910",
> "ended": "2014-05-26T15:30:17.697"
> }
>
>
> Sending direct http using JMeter. I sent 5000 requests give or take, but 
> Marvel and Head show that only 2500 documents got indexed???
>
> I'm I missing something is it a config thing?
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9bfbc6d4-8b46-4c60-b82c-0fcc0702f31b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


ES indexing exactly half the documents.

2014-05-26 Thread John Smith
Using ES 1.1.1

I have 4 node cluster

1 primary 3 workers

Starting each node with Elasticsearc.bat (Uisng windows 2003, JAVA 1.7_45 
and 16GB heap)

Using the index api to create the index automatically on the first POST 
operation

POST http://10.0.0.xxx:9200/xxx/abc/

POST data:
{
"account" : 12345678,
"started": "2014-05-26T15:30:14.910",
"ended": "2014-05-26T15:30:17.697"
}


Sending direct http using JMeter. I sent 5000 requests give or take, but 
Marvel and Head show that only 2500 documents got indexed???

I'm I missing something is it a config thing?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/638d37a9-5060-4d3d-bc15-f62ec023b8dd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Question about time based indexes/rolling indexes and eviction policies?

2014-05-26 Thread John Smith
Thanks!

On Monday, 26 May 2014 03:58:15 UTC-4, Jörg Prante wrote:
>
> 1. I will add a timeseries mode to my JDBC plugin soon. Right now you can 
> create timestamps with bash (or your favorite shell) and append it as a 
> suffix to the index name into the river/feeder creation call, but this can 
> be automated. No ETA yet.
>
> 2. This is also a nifty feature, I will experiment with the JDBC plugin if 
> I can estimate the data volume to index (probably from the data volume of 
> previous runs) or if I can make an educated guess about data growth in ES 
> data folders, and will refuse to continue if a limit is exceeded. Index 
> data volume can fluctuate due to segment creations and merging so this 
> would have to include an optimization strategy, or I rely on the JDBC 
> source. 
>
> Eviction is a harder topic, since I hesitate to create a plugin that can 
> delete data without user interaction. Even eviction rules in a plugin 
> configuration may contain mistakes and are risky. But I also see the 
> usefulness of obsoleting indexed data by dropping them regularly. I don't 
> want to take responsibility for this in the JDBC plugin, so this may just 
> be another plugin implementation.
>
> Jörg
>
>
> On Fri, May 23, 2014 at 8:13 PM, John Smith 
> > wrote:
>
>> #1
>> I have been reading around and some people suggest if doing "log" 
>> analytics to split the index based on time.
>> Is this built in into Elastic search or does it mean I have to do it 
>> manual?
>>
>> If manual
>>
>> PUT http://myhost:9200/myindex-(get-current-date-here)/SomeDoc/Id
>>
>> I'm pulling my data from SQL server and going to either use ETL or JDBC 
>> gatherer. I suppose the ETL process needs to consider the date and when it 
>> does it's index PUT to check and roll over the date so that a new index 
>> gets created?
>> And my queries need to consider this also so they know that on each day 
>> they need to search the new index?
>>
>> #2 is there such a thing as eviction policies?
>> Basically is there a way to check if we are running out of diskspace and 
>> to either remove entries from the index or in the above case delete/archive 
>> indexes older then a few days?
>>
>>
>>
>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/25618b41-f567-4d22-a2df-ca9319017897%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/25618b41-f567-4d22-a2df-ca9319017897%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f8a28604-993f-44c4-8632-249cd01d29c0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Question about time based indexes/rolling indexes and eviction policies?

2014-05-23 Thread John Smith
#1
I have been reading around and some people suggest if doing "log" analytics 
to split the index based on time.
Is this built in into Elastic search or does it mean I have to do it manual?

If manual

PUT http://myhost:9200/myindex-(get-current-date-here)/SomeDoc/Id

I'm pulling my data from SQL server and going to either use ETL or JDBC 
gatherer. I suppose the ETL process needs to consider the date and when it 
does it's index PUT to check and roll over the date so that a new index 
gets created?
And my queries need to consider this also so they know that on each day 
they need to search the new index?

#2 is there such a thing as eviction policies?
Basically is there a way to check if we are running out of diskspace and to 
either remove entries from the index or in the above case delete/archive 
indexes older then a few days?





-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/25618b41-f567-4d22-a2df-ca9319017897%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Performance tuning ES for in-memory

2014-04-23 Thread John Smith
Ok so I decided to skip in-memory for now just to test bassic functionality.

I'm running elastic search with defaults as

./elasticsearch -Xms32g -Xmx32g

I also got bigdesk installed.

Either I'm not getting something... But why as I write more documents to 
the index the 

Indexing requests per second (Δ) 

goes down and the

Indexing time per second (Δ)

Is going up

So basically it's getting slower and slower.

Is their any sensible tuning parameters. I would expect that the insertion 
should be stable and not getting slower.

It's a 32 core machine and enough ram, standard drives though. There has to 
be a way to setup buffers and queues to alliviate the issue of disks "being 
slow"


On Wednesday, April 23, 2014 10:42:37 AM UTC-4, John Smith wrote:
>
> On a 32 core machine? Plus I think 1.7_51 uses G1
>
> I have tested another "indexing" api up to 190GB or so with 30,000,000 
> objects and my latency was 3ms overall including network and app logic.
>
> And I haven't tested that many records with elastic search yet ;)
>
>
> On Wednesday, 23 April 2014 10:20:20 UTC-4, Jörg Prante wrote:
>>
>> The ES "memory" or "ram" store (Lucene RAMDirectory) puts enormous 
>> pressure on JVM garbage collection.
>>
>> You can not expect that standard JVM with CMS GC can give the best 
>> performance.
>>
>> More info in this great article by Mike McCandless
>>
>>
>> http://blog.mikemccandless.com/2012/07/lucene-index-in-ram-with-azuls-zing-jvm.html
>>
>> Maybe Java 8 with G1 GC is giving slightly better numbers. But do not 
>> expect too much.
>>
>> Jörg
>>
>>
>>
>>
>>
>> On Wed, Apr 23, 2014 at 3:19 PM, John Smith  wrote:
>>
>>> 1.7_51 but i dont see how their could be a limitation.
>>>
>>> I used java up to 200GB easily and with no issues either...
>>>
>>> --
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/cf8950c9-923e-4cea-90f5-c51a09bf1b16%40googlegroups.com
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c6e486c7-6076-4c9e-9073-0f0b808a36e8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Performance tuning ES for in-memory

2014-04-23 Thread John Smith
On a 32 core machine? Plus I think 1.7_51 uses G1

I have tested another "indexing" api up to 190GB or so with 30,000,000 
objects and my latency was 3ms overall including network and app logic.

And I haven't tested that many records with elastic search yet ;)


On Wednesday, 23 April 2014 10:20:20 UTC-4, Jörg Prante wrote:
>
> The ES "memory" or "ram" store (Lucene RAMDirectory) puts enormous 
> pressure on JVM garbage collection.
>
> You can not expect that standard JVM with CMS GC can give the best 
> performance.
>
> More info in this great article by Mike McCandless
>
>
> http://blog.mikemccandless.com/2012/07/lucene-index-in-ram-with-azuls-zing-jvm.html
>
> Maybe Java 8 with G1 GC is giving slightly better numbers. But do not 
> expect too much.
>
> Jörg
>
>
>
>
>
> On Wed, Apr 23, 2014 at 3:19 PM, John Smith 
> > wrote:
>
>> 1.7_51 but i dont see how their could be a limitation.
>>
>> I used java up to 200GB easily and with no issues either...
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/cf8950c9-923e-4cea-90f5-c51a09bf1b16%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b67173b8-cccf-454c-97ce-dc8062e34690%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Performance tuning ES for in-memory

2014-04-23 Thread John Smith
1.7_51 but i dont see how their could be a limitation.

I used java up to 200GB easily and with no issues either...

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cf8950c9-923e-4cea-90f5-c51a09bf1b16%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Performance tuning ES for in-memory

2014-04-22 Thread John Smith
I wrote in my post 1.7_51

And the docs seem to mention that an in-memory index can be as big as ram.

And I have ran an app up to 196GB with another "indexing" api called cq-engine.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/0c1e43a6-cc48-497b-bc28-078cff291495%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Performance tuning ES for in-memory

2014-04-22 Thread John Smith
Hi I downloaded the latest ES 1.1.1

I have a 200GB RAM with 2 x 8 cores hyper threaded. "32" cores total 
machine 1.6T of disk space.

I start elastic search as follows...

./elasticsearch -Xms100g -Xmx100g -Des.index.store.type=memory
Using Java 1.7_51

I then create my index as follows...

$ curl -XPUT http://localhost:9200/myindex/ -d \
'
index :
store:
type: memory
'

And my Java web app (Using vertx.io)

// On app startup... Ensure we have one instance of client. Regardless how 
many app threads may write to the index.
synchronized(clientCreated) {
 if(clientCreated.compareAndSet(false, true)) {
node = nodeBuilder().clusterName("elasticsearch").client(true).node(); 
client = node.client();
}
}


// Per request coming into my web application. Using vertx for the web 
framework.
// For each request we use the one client instance.
client.prepareIndex("myindex", "doc", request.getString("id"))
.setSource(bodyStr) // Already sending Json so no need to convert 
it!
.execute(new ActionListener(){

@Override
public void onFailure(Throwable t) {
req.response().end("Error: " + t.toString());
 }

@Override
public void onResponse(IndexResponse res) {
req.response().end(res.getIndex());
}});


Both the webapp and ES running on same server. So all write/read requests 
are localhost.

Testing as follows

JMeter (100 users, running on my desktop) -- Remote > WebApp - 
localhost > ES

I get about 6000 writes/sec and it seems to get lower as the number of docs 
that get indexed increases.
Average request/response latency is about 15-20ms.
Network Time/Jmeter data generation( Each document is about 1000 bytes)/web 
app is about 5 ms. I know this because I also have a simple hello world 
response to test the average latency of those 3 "parameters".
So it seems that in-memory takes average 15ms I would think ES can do much 
better then that?

Is there any tuning settings I can try for strictly in-memory index?

Thanks








-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dc7c69f8-d4ab-42f7-88dc-c165472c2892%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.