Re: How to diagnose slow queries every 10 minutes exactly?

2015-04-21 Thread AlexR
it could be entirely unrelated but if I recall someone reported similar regular 
interval slowness. it proved to be the load balancer they used if I remember 
correctly. 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9073c7ba-7fdd-4ed5-80dd-c0499e618f9a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is ElasticSearch truly scalable for analytics?

2015-01-15 Thread AlexR
I would be also very interested in node level shard results reduction but not 
for scalability but precision reasons. I would like to have an option for a 
node to do complete aggregations on its shards so the results are exact rather 
than approximate. There are many use cases when corpus of data is reltively 
small to fit one powerful node and exactness is a MUST. With 48 core servers 
and ssd drives such node can process good deal of data and produce exact 
results which is a must for traditional datamart-like apps. Having this option 
will allow for this class of apps to be built. And in myltinode setup it wull 
provide better precision too

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d3fb8f8d-4563-4e97-b0fd-3cc220f252bc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Aggregations Parallelism Level

2014-12-26 Thread AlexR
Hi,

What's the level of parallelism when aggregations are calculated? Is it thread 
per shard?
In this case I assume a node hosting one index should have have roughly one 
shard per server core?

Is it the same for searching or lucene supports parallel search on the same 
index (ES shard)

Thanks
Alex
 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b44c62d4-519e-4e3b-9b5c-ea95854fdf2f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is ElasticSearch truly scalable for analytics?

2014-12-19 Thread AlexR
Jorg, if you have a single large index and a cluster with 3 nodes do you 
suggest to create just 3 shards even though each node has say 16 cores. With 
just three shards they will be very big and not much patallelism in 
computations will occur.
am I missing something?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/36d1a61a-e996-4bec-97b7-0842fc118cb2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Only partial results returned for aggregation + ElasticsearchIllegalStateException when trying scroll

2014-12-18 Thread AlexR
I second that. May of us need accurate results at the expense of 
performance. So an optional two step execution for results correction (for 
buckets not present in all shards responses) would be very helpful!
A great first step would be to do so on a single node (if not already done) 
when aggregating its shards as it does not impose as much overhead (no need 
to extra network calls). 
For me personally it would be super helpful because we have a large number 
of fairly small datasets which fit on one good server and we need exact 
analysis

also see my (admittedly naive) comment 
on 
https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/elasticsearch/aLBv2QB7VMg

Nick,

I am not an expert in this area either but with multi-core processors (24, 
32, 48) it is not uncommon to have fairly large number of shards on a node 
so 30 shards is not out of question
I assumed that ES aggregate shard results on a node prior to shipping them 
to a master but I do not know if it is true. It may very well be that node 
sends per shard aggregations to the master which case it it 32xShard 
ResultSize for our 32 shard node. reducing size of network packet by 32 
(even if it were just 8) and work for master by the same ratio is not a 
chump change. Somehow I think ES already doing it :-) but who knows 

Another potential benefit of doing node aggregation is that on a single 
node when aggregating multiple shards ES could resolve potential errors by 
aggregating all buckets and re-calculating buckets not present in every 
shard at a fairly low cost while doing so across nodes is costly. On the 
other hand it may amplify the error across nodes do not know
- s


On Thursday, December 18, 2014 12:45:27 PM UTC-5, Eran Duchan wrote:
>
> Thanks Adrien
>
> *> Did you try to add pagination on a request of type COUNT?*
> Yes, I run the aggregation with search_type=count. 
>
> The thing is I need accurate results, not super fast execution. Scoring is 
> something we don't use and need so I would like _all_ relevant results 
> (i.e. results which pass the supplied query filter) across all shards/nodes 
> to be added to the list of results. I tried doing this by setting size to a 
> high number (100), but to no avail. I see the documentation you 
> referred to indicates that size:0 removes any limit, but shouldn't a high 
> size work as well? Is there an inherent limitation to run the query 
> originally posted and expect accurate results?
>
> Eran
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/9498921b-4bae-40f2-938a-bd2df1889eb9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is ElasticSearch truly scalable for analytics?

2014-12-18 Thread AlexR
Nick,

I am not an expert in this area either but with multi-core processors (24, 
32, 48) it is not uncommon to have fairly large number of shards on a node 
so 30 shards is not out of question
I assumed that ES aggregate shard results on a node prior to shipping them 
to a master but I do not know if it is true. It may very well be that node 
sends per shard aggregations to the master which case it it 32xShard 
ResultSize for our 32 shard node. reducing size of network packet by 32 
(even if it were just 8) and work for master by the same ratio is not a 
chump change. Somehow I think ES already doing it :-) but who knows 

Another potential benefit of doing node aggregation is that on a single 
node when aggregating multiple shards ES could resolve potential errors by 
aggregating all buckets and re-calculating buckets not present in every 
shard at a fairly low cost while doing so across nodes is costly. On the 
other hand it may amplify the error across nodes do not know


On Thursday, December 18, 2014 11:26:37 AM UTC-5, Nikolas Everett wrote:
>
> I think aggregating 32 shards on one node is a bit degenerate.  I imagine 
> its more typical to aggregate across one of two shards per node.  Don't get 
> me wrong, you can totally have nodes store and query ~100 shards each 
> without much trouble.  If aggregating across a bunch of shards per node 
> were a common thing I think a node level reduce step might help.  I'm 
> certainly no expert in the reduce code though.
>
> Nik
>
> On Thu, Dec 18, 2014 at 10:48 AM, Yifan Wang  > wrote:
>>
>> Sorry, if I did not make it clear. For sure I know aggregation is done on 
>> the node for each shard, but here is the challenge. Say we set 
>> shard_size=50,000. ES will aggregate on each shard and create buckets for 
>> the matching documents, and then send top 50,000 buckets to the client node 
>> for Reduce. Say we have 50 data nodes, and each node has 32 shards. This 
>> means we need to send 50,000 buckets from each shard to the client node for 
>> final aggregation. First, this may add heavy traffic to the network (what 
>> if we have 100 nodes?). And second, the client will need to aggregate on 
>> received 50*32*50,000 buckets. Would this cause any congestion on the 
>> client node? However if we can aggregate on the node first, meaning reduce 
>> from 32 buckets to only one bucket, then the client node only has to 
>> process 50 buckets. This would significanly reduce the network traffic and 
>> improve the scalability, plus because we can set relatively larger 
>> shard_size, it will improve the accuracy of the final results, which is 
>> another key issue we are facing in distributed environment on aggregations.
>>
>> So my key question is about the scalability particularly on aggregations. 
>> It seems to be a challenge in my experience. I just want to hear other 
>> people's experience. On heavy analytics applications, this will be a key.
>>
>> Of course, I also understand, adding node level aggregation may impact 
>> the overall performance. I am wondering if anyone has thought about or done 
>> anything in this aspect.
>>
>> BTW, I like ElasticSearch, but want to hear from the community on some of 
>> the key challenges.
>>
>>
>>
>> On Thursday, December 18, 2014 9:34:07 AM UTC-5, Adrien Grand wrote:
>>>
>>> +1 to what AlexR said. I think there is indeed a bad assumption that 
>>> shards just forward data to the coordinating node, this is not the case.
>>>
>>> On Thu, Dec 18, 2014 at 1:09 AM, AlexR  wrote:
>>>>
>>>> if you take a terms aggregation, the heavy lifting of the aggregation 
>>>> is done on each node then aggregated results are combined on the master 
>>>> node. So if you have thousands of nodes and very high cardinality nested 
>>>> aggs the merging part may become a bottleneck but cost of doing actual 
>>>> aggregation in most cases is far higher than cost of merging results from 
>>>> reasonable number of shards. So in practice I think it balances pretty 
>>>> well. Of course you are not limited to one master to handle concurrent 
>>>> requests
>>>>
>>>> On Wednesday, December 17, 2014 4:12:44 PM UTC-5, Yifan Wang wrote:
>>>>>
>>>>> I thought ES only "Collect" on individual shards, and "Reduce" on 
>>>>> Client Node (master if you call it), nothing is done at the data node 
>>>>> level.
>>>>>
>>>>> On Tuesday, December 16, 2014 1:31:30 PM UTC-5, AlexR wrote:
>>>>>>
>>>>>> ES already doing 

Re: Is ElasticSearch truly scalable for analytics?

2014-12-17 Thread AlexR
if you take a terms aggregation, the heavy lifting of the aggregation is 
done on each node then aggregated results are combined on the master node. 
So if you have thousands of nodes and very high cardinality nested aggs the 
merging part may become a bottleneck but cost of doing actual aggregation 
in most cases is far higher than cost of merging results from reasonable 
number of shards. So in practice I think it balances pretty well. Of course 
you are not limited to one master to handle concurrent requests

On Wednesday, December 17, 2014 4:12:44 PM UTC-5, Yifan Wang wrote:
>
> I thought ES only "Collect" on individual shards, and "Reduce" on Client 
> Node (master if you call it), nothing is done at the data node level.
>
> On Tuesday, December 16, 2014 1:31:30 PM UTC-5, AlexR wrote:
>>
>> ES already doing aggregations on each node. it is not like it is shipping 
>> row level query data back to master for aggregation. 
>> In fact, one unpleasant effect of it is that aggregation results are not 
>> guaranteed to be precise due to distributed nature of the aggregation for 
>> multibucket aggs ordered by count such as terms
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/61122d28-8f62-4ee2-b9e7-6fd99048ee8e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Is ElasticSearch truly scalable for analytics?

2014-12-16 Thread AlexR
ES already doing aggregations on each node. it is not like it is shipping row 
level query data back to master for aggregation. 
In fact, one unpleasant effect of it is that aggregation results are not 
guaranteed to be precise due to distributed nature of the aggregation for 
multibucket aggs ordered by count such as terms

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4a9aaac6-7273-44e6-be5e-9403e12a5249%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: performance issues

2014-12-08 Thread AlexR
you could try to specify multiple fields in multifield mapping with string type 
(or type date) and different formats. Not sure if it is going to work though I 
typically do this kind of staff in actual data. maybe something like: 
"timestamp": {
  "type": "date",
  "format": "date",
  "fields": {
 "year": {
   "type": "string", (or maybe date if string does not invoke formatter?)
   "format": "",
 },
 "year-month": {
   "type": "string", (or maybe date if string does not invoke formatter?)
   "format": "-MM",
 }
  }
},


I would do in data (makes it bigger but gives you complete freedom to define 
your dimension)

{
...
callStartTime:{timestamp:'full timestamp', time:'rounded to seconds', 
weekOfMonth:3, month:11, year:2014}
}

then you can choose to not index timestamp at all and index the rest.

if your histogram is based on "absolute" date/time not on date/time relative to 
today you could use term aggregation instead of ranges which should be faster 
potentially much faster


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2479246b-7c9f-45c7-b4d9-f19416575d1e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: performance issues

2014-12-07 Thread AlexR
and if you provide plenty of memory (8G heap for 111M records with 
aggregations do not seem enough) for caching of filters and fields and OS 
memory for caching data files (and/or use SSD) parallel calculation on 
multiple shards should provide lot better improvement than 50% may be not 
exactly linear but at least 3-4 times for going from 1 to 6 shards in my 
opinion) assuming you have more than 6 cores. The  memory pressure you 
mention needs to be removed too. Analyze stats but I suspect 8G is just not 
enough in your case.

Would be interesting to see if aggregating on rounded (to date) timestamp 
would improve things on its own.


On Sunday, December 7, 2014 5:14:17 AM UTC-5, msbr...@gmail.com wrote:
>
> How many docs do you expect your histogram will aggregate? Most of your 
>> 111M? If so with just one shard and one thread doing the work it is bound 
>> to be pretty slow. 
>>
>
> Expected aggregated records are 78mio. After reindexing with 6 shards per 
> index the query time reduced by ~50%. The result was surprising: someone 
> wrote several shards on a single disk have less effect, because they share 
> the same i/o. But I should mention the threading effect. Are there 
> recommendations about shard size vs shard count?
>  
>
>> Also have you tried moving your not missing filter out of the agg into 
>> the query filter and also just using > 0 instead of not missing. Also 
>> reducing precision of the timestamp could possible help
>
>
> Removing the missing filter out of the query gives more speed. I cannot 
> remember why I used this missing filter. In current test setup the target 
> result set is identical, even if using 'missing filter'. Is there need to 
> use 'missing filter' here? What happens, if field 'duration' is missing or 
> null in some records?
>
> What is your recommendation to timestamp? Should I replace 
>
> 2014-01-15T14:17:06.245+01:00
>
> with less accuracy in minutes
>
> 2014-01-15T14:17:00.000+01:00
>
> ? Would this affect the field data cache?
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b1ce1f3c-f75c-43a9-9eb8-c37116ec2453%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: performance issues

2014-12-07 Thread AlexR
Missing filter is fairly costly. I do not believe you need it as > 0 should 
take care of excluding nulls

one thread can act on one shard at the same time so the only way you can 
parallelize you query is by splitting it onto more shards to let multiple 
threads do parallel work on smaller sized shards. So if your server has say 
16 cores you may consider roughly the same number of shards (maybe a bit 
fewer)  
If it is IO bound rather than CPU bound, more memory for OS level caching 
and probably bumping up ES heap as well could help, as well as faster 
storage - SSDs work great with ES and at some point you may need to have 
several nodes

I believe reducing date precision would decrease number of unique terms in 
the index and may help with hystogram. Say, if your histogram precision 
needs date only and not time I would not even index time part (note you may 
use multifield mapping if you need both precise and date rounded timestamp) 


On Sunday, December 7, 2014 5:14:17 AM UTC-5, msbr...@gmail.com wrote:
>
> How many docs do you expect your histogram will aggregate? Most of your 
>> 111M? If so with just one shard and one thread doing the work it is bound 
>> to be pretty slow. 
>>
>
> Expected aggregated records are 78mio. After reindexing with 6 shards per 
> index the query time reduced by ~50%. The result was surprising: someone 
> wrote several shards on a single disk have less effect, because they share 
> the same i/o. But I should mention the threading effect. Are there 
> recommendations about shard size vs shard count?
>  
>
>> Also have you tried moving your not missing filter out of the agg into 
>> the query filter and also just using > 0 instead of not missing. Also 
>> reducing precision of the timestamp could possible help
>
>
> Removing the missing filter out of the query gives more speed. I cannot 
> remember why I used this missing filter. In current test setup the target 
> result set is identical, even if using 'missing filter'. Is there need to 
> use 'missing filter' here? What happens, if field 'duration' is missing or 
> null in some records?
>
> What is your recommendation to timestamp? Should I replace 
>
> 2014-01-15T14:17:06.245+01:00
>
> with less accuracy in minutes
>
> 2014-01-15T14:17:00.000+01:00
>
> ? Would this affect the field data cache?
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/49ea1396-70bb-4b3f-a5f5-764d53445f79%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: performance issues

2014-12-06 Thread AlexR
How many docs do you expect your histogram will aggregate? Most of your 111M? 
If so with just one shard and one thread doing the work it is bound to be 
pretty slow.

Also have you tried moving your not missing filter out of the agg into the 
query filter and also just using > 0 instead of not missing. Also reducing 
precision of the timestamp could possible help

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/f8e36514-6f2a-4283-9f75-312aab3a2fea%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Is there any way to prevent ES from disclosing exception details in REST response?

2014-09-17 Thread AlexR
I may be missing something but in the filter the check for error will need 
to be done after calling chain.doFilter(req, resp); (or we would not know 
the status which is set by the NodeServlet). At that point it is too late 
to do anything about response body if the output stream was written to. 
That's the reason for capturing stream in memory by using response wrapper 
and then writing or not writing it depending on status to the real response 
object

On Tuesday, September 16, 2014 3:10:25 AM UTC-4, Jörg Prante wrote:
>
> You could just check for the response code 500, and you're done, no need 
> to capture streams.
>
> Jörg
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6da2ac62-ead6-4588-aea6-a69e330e9e04%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Is there any way to prevent ES from disclosing exception details in REST response?

2014-09-15 Thread AlexR
We expose ES _search endpoint directly to consumers. When our REST API get 
scanned for security vulnerabilities it complains on ES returning exception 
details. For example a malformed query will be included in the response 
along with exception. While it is more or a less harmless the tool 
complains of various injections and internals disclosures. I would like to 
be able to turn error message in the response off (or substitute it with a 
generic message) in production while keeping normal response logic in 
development. 

Is there any way I can do it?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/83e6b126-4db1-44cc-9a0a-0dd6c6d44a64%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Sense on github abandoned?

2014-04-16 Thread AlexR
Well, this is perhaps a too strong of a statement. AFAIK SOLR does not have 
any comparable front-end and not many would abandon elastic because marvel 
is a commercial product. 

I respect Elastic team trying to monetize its product(s) and do it in a 
rather nice way. So I would be perfectly happy paying for Sense. Make Sense 
available for purchase as a separate product (perhaps with two 
licenses development/personal and corporate) for a reasonable price on 
chrome marketplace or elsewhere. I would say many developers would be happy 
to pay the price and if a corporation needs it it can license a bunch as 
well. It could still be free when bundled inside of Marvel on DEV boxes... 
 
Tying it to marvel definitely reduce choices and creates inconveniences 
hiding this very useful product that can excel on its own merits. I doubt 
it will help dramatically in selling marvel (which will hopefully be a 
success due to its own value) 

 

On Wednesday, April 16, 2014 10:43:59 AM UTC-4, jrizzi1 wrote:
>
> What this essentially does is limit a developer's options 
>
> I went to my boss, and laid out the plans for implementing ES, and told 
> them 
> there was no cost, open-source 
>
> Now i have to go back and explain we need licensing on our production VM 
> if 
> we need to use sense on that VM, we don't need marvel, its an internal app 
> to a department of 100 or less users 
>
> If i originally had laid out a plan with costs/licensing agreements, it 
> would have had to go to a guidance council for approval of the license, 
> department cost approval, and more than likely would have been ruled out 
> as 
> an option , and gone with SOLR instead 
>
>
>
> -- 
> View this message in context: 
> http://elasticsearch-users.115913.n3.nabble.com/Sense-on-github-abandoned-tp4052988p4054272.html
>  
> Sent from the ElasticSearch Users mailing list archive at Nabble.com. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/45ecb6dc-00ed-4e30-a093-a1d2e18a715d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Indexing dates as text with synonyms for month and maybe 2/4 digit year

2014-04-16 Thread AlexR
Hi,

Is there a filter I can use when indexing a date (ISO date format without 
time) as text field? 
by default it is split on "-" and I would like to keep it and add month 
name as synonym for month number. 

on the other note, is there any way to make it not to be split on separate 
tokens when it is fed into _all field (for which I use standard tokenizer + 
stemming and few other things)?

Thanks,
Alex

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2ffced3a-2161-4409-8103-a3fd20835700%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Embedded Elasticsearch on shutdown: java.lang.IllegalArgumentException: Illegal shift value, must be 0..63

2014-04-07 Thread AlexR
I am not sure it can be done with context listeners particularity when shutting 
down tomcat. I do not believe it will wait for listeners to complete - they are 
fired asynchronously and tomcat exits without waiting but I am not positive of 
that. I was going to test it but have not had a chance

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/32e14429-d24a-42c6-9045-9682087c2170%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Finding "Singular" matches when searching a "Plural" term

2014-04-06 Thread AlexR
Hi Jörg, 

Is release of your plugin for ES 1.0.x and 1.1.x available? 
How is performance comparing to kstem?

Thanks
Alex

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/3ba11e28-e2ad-409a-91cc-aa101396aa66%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Sense on github abandoned?

2014-04-02 Thread AlexR
If it is a matter of paying for Sense, I would vote for a paid chrome extension 
at a reasonable price so people who need sense can purchase it independently 
from marvell 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/61825c6e-9fee-4251-a8a1-efb069407304%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Suitability of Aggregation framework for traditional OLAP style analytics

2014-03-12 Thread AlexR
I am having an issue with aggregation framework in one particular aspect - 
its not handling _missing values and _other values as they were handled in 
facets (_other was not handled even in any stats facets just in terms)
traditional OLAP would slice/roll-up the same data set by various 
dimensions so roll up counts/totals would come out the same even if we are 
grouping on a field which may be null for some records

With aggregation framework there is no way to do it except for an 
exceedingly convoluted use of "missing" aggregation (just try to have 
missing values aggregated as part of the overall bucket-set when doing 
multi-level aggregation)
you can find lot more details, example and my proposal here 
https://github.com/elasticsearch/elasticsearch/issues/5324 

Unfortunately it did not get any reply from the development team so I can 
only assume they are not convinced or did not read it (would be nice if 
they at least said so may be they missed it all together)

I do not want to replace null values with some fake values representing 
null and have it bleed all over the applications consuming JSON data from 
elastic. So if Elastic is not going to handle missing, what are my options 
(apart from the option described in my proposal on github)?

Could I use "null_value" in my index mapping for nullable fields? Will they 
be used for aggregation rather than _source. Even if they are will it work 
when null value  is one of objects and I am aggregating on that object 
properties?
(i.e. case type is {caseNumber:123, caseType:{id:10, name:'Civil'}} and I 
am aggregating on caseType.id and caseType could be null)



-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/6ad26233-59f4-4464-a380-2832685e528b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


JDK 7 Issues Question

2014-03-03 Thread AlexR
Few times on this newsgroups I noticed some serious issues are mentioned 
when running ES under JDK7 update greater than u25 and that it should be 
fixed in u60.
Could anyone in the know elaborate on the issue. What's its nature, does it 
occur on all platforms (our deployment target is 64bit Server JVM JDK 17u51 
on Centos 6 )  and whether it affects 1.0.x builds

Thank you,
Alex
 
 

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c09976c0-9e0e-4f61-84e8-9281c7efee7f%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Does Scan API support doc key sort order

2014-03-03 Thread AlexR
Hello,

I need to compare ES with source data and I would like to do it in one pass 
if possible which means I need to pull my data from ES in the doc key 
order. 
Is it possible to do it with SCAN API  or scan order is undefined (based on 
how it is stored in lucene segments?)

thank you,
Alex

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/058dbe1a-2e17-487a-b31a-0dc4b8112d22%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: How can I merge the results of two aggregations?

2013-12-25 Thread AlexR
I have not looked at aggs but term facet can run against multiple fields.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e37b88d6-9a48-4d5f-9787-2ad615eb574d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.